Skip to content

Commit 08d9697

Browse files
committed
Add LINK_RESET recovery action to documentation
Summary: TF2.6 only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, #tech_docs, kamil.andrzejewski Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, kamil.andrzejewski Subscribers: jayniep Maniphest Tasks: T55639 JIRA Issues: AFS-235 Differential Revision: https://phabricator.sourcevertex.net/D83780
1 parent cb9502a commit 08d9697

File tree

1 file changed

+12
-3
lines changed

1 file changed

+12
-3
lines changed

tensorflow/compiler/plugin/poplar/docs/device_selection.rst

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -447,9 +447,16 @@ These runtime errors are handled in the following manner:
447447
* ``application_runtime_error`` - a ``tensorflow.errors.InternalError`` error
448448
is raised. The error message contains the reason why the error occurred. An
449449
IPU reset will be performed before the next execution of a Poplar program.
450-
* ``recoverable_runtime_error`` with a recovery action ``poplar::RecoveryAction::IPU_RESET`` - a ``tensorflow.errors.InternalError`` error
451-
is raised. The error message contains the reason why the error occurred. An
452-
IPU reset will be performed before the next execution of a Poplar program.
450+
* ``recoverable_runtime_error``- a ``tensorflow.errors.InternalError`` error
451+
is raised. The error message contains the reason why the error occurred
452+
and `recovery_action` string attribute.
453+
This attribute can contain:
454+
455+
- `IPU_RESET`: IPU reset will be performed before the next execution of a Poplar program.
456+
- `LINK_RESET`: Reset the IPU-Links in a non-Pod system. This retrains the IPU-Links between IPUs.
457+
- `PARTITION_RESET`: Reset the IPU partition in a Pod system. This retrains the IPU-Links between IPUs.
458+
- `FULL_RESET`: Power cycle the system.
459+
453460
* Unknown runtime errors - a ``tensorflow.errors.Unknown`` error
454461
is raised. The error message might contain the reason why the error occurred.
455462
When these errors occur manual intervention is required before the system is
@@ -459,3 +466,5 @@ These runtime errors are handled in the following manner:
459466
When these errors occur manual intervention might be required before the
460467
system is operational again. The error message might contain a required
461468
recovery action.
469+
470+

0 commit comments

Comments
 (0)