diff --git a/docs/sphinx/api/qec/python_realtime_decoding_api.rst b/docs/sphinx/api/qec/python_realtime_decoding_api.rst index 99ddd7e6..2bff97ca 100644 --- a/docs/sphinx/api/qec/python_realtime_decoding_api.rst +++ b/docs/sphinx/api/qec/python_realtime_decoding_api.rst @@ -72,6 +72,45 @@ Configuration API The configuration API enables setting up decoders before circuit execution. Decoders are configured using YAML files or programmatically constructed configuration objects. +Configuration Types +^^^^^^^^^^^^^^^^^^^ + +.. py:class:: cudaq_qec.trt_decoder_config + + Configuration for TensorRT decoder in real-time decoding system. + + **Attributes:** + + .. py:attribute:: onnx_load_path + :type: Optional[str] + + Path to ONNX model file. Mutually exclusive with engine_load_path. + + .. py:attribute:: engine_load_path + :type: Optional[str] + + Path to pre-built TensorRT engine file. Mutually exclusive with + onnx_load_path. + + .. py:attribute:: engine_save_path + :type: Optional[str] + + Path to save built TensorRT engine for reuse. + + .. py:attribute:: precision + :type: Optional[str] + + Inference precision mode: "fp16", "bf16", "int8", "fp8", "tf32", + "noTF32", or "best" (default). + + .. py:attribute:: memory_workspace + :type: Optional[int] + + Workspace memory size in bytes (default: 1073741824 = 1GB). + +Configuration Functions +^^^^^^^^^^^^^^^^^^^^^^^^ + .. py:function:: cudaq_qec.configure_decoders(config) Configure decoders from a multi_decoder_config object. diff --git a/docs/sphinx/api/qec/trt_decoder_api.rst b/docs/sphinx/api/qec/trt_decoder_api.rst index 590243f9..a0394fb9 100644 --- a/docs/sphinx/api/qec/trt_decoder_api.rst +++ b/docs/sphinx/api/qec/trt_decoder_api.rst @@ -10,6 +10,11 @@ architecture and supports various precision modes (FP16, BF16, INT8, FP8) to balance accuracy and speed. + Neural network-based decoders can be trained to perform syndrome decoding + for specific quantum error correction codes and noise models. The TRT decoder + provides a high-performance inference engine for these models, with automatic + CUDA graph optimization for reduced latency. + Requires a CUDA-capable GPU and TensorRT installation. See the `CUDA-Q GPU Compatibility List `_ @@ -80,6 +85,13 @@ only required to satisfy the decoder interface. You can pass any valid parity check matrix of appropriate dimensions. + .. note:: + **Batch Processing**: The TRT decoder automatically handles batch size + optimization. Models trained with batch_size > 1 will receive + zero-padded inputs when using `decode()` on a single syndrome. When + using `decode_batch()`, provide syndromes in multiples of the model's + batch size for optimal performance. + :param H: Parity check matrix (tensor format). Note: This parameter is not used by the TRT decoder but is required by the decoder interface. :param params: Heterogeneous map of parameters: @@ -116,3 +128,16 @@ engine building (defaults to 1GB = 1073741824 bytes). Larger workspaces may allow TensorRT to explore more optimization strategies. + - `use_cuda_graph` (bool): Enable CUDA graph optimization for improved + performance (defaults to True). CUDA graphs capture inference operations + and replay them with reduced kernel launch overhead, providing ~20% + speedup. The optimization is applied automatically on the first decode + call. Automatically disabled for models with dynamic shapes or + multiple optimization profiles. Set to False to force traditional + execution path. + + - `batch_size` (automatic): The decoder automatically detects the model's + batch size from the first input dimension. For models with batch_size > 1, + the `decode()` method automatically zero-pads single syndromes to fill + the batch. The `decode_batch()` method requires the number of syndromes + to be an integral multiple of the model's batch size. diff --git a/docs/sphinx/examples_rst/qec/realtime_decoding.rst b/docs/sphinx/examples_rst/qec/realtime_decoding.rst index 0c14d180..b1c02de9 100644 --- a/docs/sphinx/examples_rst/qec/realtime_decoding.rst +++ b/docs/sphinx/examples_rst/qec/realtime_decoding.rst @@ -529,6 +529,12 @@ Decoder Selection ^^^^^^^^^^^^^^^^^ The page `CUDA-Q QEC Decoders `_ provides information about which decoders are compatible with real-time decoding. +The TRT decoder (``trt_decoder``) can be configured for real-time decoding by specifying +``trt_decoder_config`` parameters. This is useful for neural network-based +decoders trained for specific codes and noise models. Note that TRT models +must be trained with the appropriate input/output dimensions matching the +syndrome and error spaces. See :ref:`trt_decoder_api_python` for detailed configuration options. + Troubleshooting ---------------