Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/sphinx/api/qec/python_realtime_decoding_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,45 @@ Configuration API

The configuration API enables setting up decoders before circuit execution. Decoders are configured using YAML files or programmatically constructed configuration objects.

Configuration Types
^^^^^^^^^^^^^^^^^^^

.. py:class:: cudaq_qec.trt_decoder_config

Configuration for TensorRT decoder in real-time decoding system.

**Attributes:**

.. py:attribute:: onnx_load_path
:type: Optional[str]

Path to ONNX model file. Mutually exclusive with engine_load_path.

.. py:attribute:: engine_load_path
:type: Optional[str]

Path to pre-built TensorRT engine file. Mutually exclusive with
onnx_load_path.

.. py:attribute:: engine_save_path
:type: Optional[str]

Path to save built TensorRT engine for reuse.

.. py:attribute:: precision
:type: Optional[str]

Inference precision mode: "fp16", "bf16", "int8", "fp8", "tf32",
"noTF32", or "best" (default).

.. py:attribute:: memory_workspace
:type: Optional[int]

Workspace memory size in bytes (default: 1073741824 = 1GB).

Configuration Functions
^^^^^^^^^^^^^^^^^^^^^^^^

.. py:function:: cudaq_qec.configure_decoders(config)

Configure decoders from a multi_decoder_config object.
Expand Down
25 changes: 25 additions & 0 deletions docs/sphinx/api/qec/trt_decoder_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
architecture and supports various precision modes (FP16, BF16, INT8, FP8)
to balance accuracy and speed.

Neural network-based decoders can be trained to perform syndrome decoding
for specific quantum error correction codes and noise models. The TRT decoder
provides a high-performance inference engine for these models, with automatic
CUDA graph optimization for reduced latency.

Requires a CUDA-capable GPU and TensorRT installation. See the `CUDA-Q GPU
Compatibility List
<https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#dependencies-and-compatibility>`_
Expand Down Expand Up @@ -80,6 +85,13 @@
only required to satisfy the decoder interface. You can pass any valid
parity check matrix of appropriate dimensions.

.. note::
**Batch Processing**: The TRT decoder automatically handles batch size
optimization. Models trained with batch_size > 1 will receive
zero-padded inputs when using `decode()` on a single syndrome. When
using `decode_batch()`, provide syndromes in multiples of the model's
batch size for optimal performance.

:param H: Parity check matrix (tensor format). Note: This parameter is not
used by the TRT decoder but is required by the decoder interface.
:param params: Heterogeneous map of parameters:
Expand Down Expand Up @@ -116,3 +128,16 @@
engine building (defaults to 1GB = 1073741824 bytes). Larger workspaces
may allow TensorRT to explore more optimization strategies.

- `use_cuda_graph` (bool): Enable CUDA graph optimization for improved
performance (defaults to True). CUDA graphs capture inference operations
and replay them with reduced kernel launch overhead, providing ~20%
speedup. The optimization is applied automatically on the first decode
call. Automatically disabled for models with dynamic shapes or
multiple optimization profiles. Set to False to force traditional
execution path.

- `batch_size` (automatic): The decoder automatically detects the model's
batch size from the first input dimension. For models with batch_size > 1,
the `decode()` method automatically zero-pads single syndromes to fill
the batch. The `decode_batch()` method requires the number of syndromes
to be an integral multiple of the model's batch size.
6 changes: 6 additions & 0 deletions docs/sphinx/examples_rst/qec/realtime_decoding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,12 @@ Decoder Selection
^^^^^^^^^^^^^^^^^
The page `CUDA-Q QEC Decoders <https://nvidia.github.io/cudaqx/components/qec/introduction.html#pre-built-qec-decoders>`_ provides information about which decoders are compatible with real-time decoding.

The TRT decoder (``trt_decoder``) can be configured for real-time decoding by specifying
``trt_decoder_config`` parameters. This is useful for neural network-based
decoders trained for specific codes and noise models. Note that TRT models
must be trained with the appropriate input/output dimensions matching the
syndrome and error spaces. See :ref:`trt_decoder_api_python` for detailed configuration options.

Troubleshooting
---------------

Expand Down