From 90c5a6871fbd217d73f7090db774a72c10152968 Mon Sep 17 00:00:00 2001 From: Scott Thornton Date: Fri, 9 Jan 2026 23:54:54 +0000 Subject: [PATCH] docs(qec): update trt_decoder documentation for recent enhancements Update TRT decoder documentation to reflect features introduced in commits eff6966, 2954c01, and 49bdde8, bringing it inline with other QEC decoder documentation. Key additions: 1. CUDA Graph Optimization (commit 2954c01): - Document new `use_cuda_graph` parameter (default: True) - Note ~20% performance improvement from CUDA graph optimization - Explain automatic fallback for models with dynamic shapes 2. Batch Processing Support (commit eff6966): - Document automatic batch size detection - Explain zero-padding behavior for single syndrome decode() - Clarify decode_batch() requirements for batch-size multiples 3. Real-Time Decoding Integration (commit 49bdde8): - Add comprehensive trt_decoder_config documentation - Include Python and C++ examples for real-time configuration - Document YAML serialization support - Add configuration reference in python_realtime_decoding_api.rst 4. Documentation Structure Improvements: - Add performance characteristics section - Add batch processing notes - Include cross-references to real-time decoding examples - Maintain consistency with nv-qldpc and sliding_window decoder docs Files changed: - docs/sphinx/api/qec/trt_decoder_api.rst: Added parameters, real-time config section, and performance notes - docs/sphinx/api/qec/python_realtime_decoding_api.rst: Added trt_decoder_config class documentation - docs/sphinx/examples_rst/qec/realtime_decoding.rst: Added TRT decoder to decoder selection section Signed-off-by: Scott Thornton --- .../api/qec/python_realtime_decoding_api.rst | 39 +++++++++++++++++++ docs/sphinx/api/qec/trt_decoder_api.rst | 25 ++++++++++++ .../examples_rst/qec/realtime_decoding.rst | 6 +++ 3 files changed, 70 insertions(+) diff --git a/docs/sphinx/api/qec/python_realtime_decoding_api.rst b/docs/sphinx/api/qec/python_realtime_decoding_api.rst index 99ddd7e6..2bff97ca 100644 --- a/docs/sphinx/api/qec/python_realtime_decoding_api.rst +++ b/docs/sphinx/api/qec/python_realtime_decoding_api.rst @@ -72,6 +72,45 @@ Configuration API The configuration API enables setting up decoders before circuit execution. Decoders are configured using YAML files or programmatically constructed configuration objects. +Configuration Types +^^^^^^^^^^^^^^^^^^^ + +.. py:class:: cudaq_qec.trt_decoder_config + + Configuration for TensorRT decoder in real-time decoding system. + + **Attributes:** + + .. py:attribute:: onnx_load_path + :type: Optional[str] + + Path to ONNX model file. Mutually exclusive with engine_load_path. + + .. py:attribute:: engine_load_path + :type: Optional[str] + + Path to pre-built TensorRT engine file. Mutually exclusive with + onnx_load_path. + + .. py:attribute:: engine_save_path + :type: Optional[str] + + Path to save built TensorRT engine for reuse. + + .. py:attribute:: precision + :type: Optional[str] + + Inference precision mode: "fp16", "bf16", "int8", "fp8", "tf32", + "noTF32", or "best" (default). + + .. py:attribute:: memory_workspace + :type: Optional[int] + + Workspace memory size in bytes (default: 1073741824 = 1GB). + +Configuration Functions +^^^^^^^^^^^^^^^^^^^^^^^^ + .. py:function:: cudaq_qec.configure_decoders(config) Configure decoders from a multi_decoder_config object. diff --git a/docs/sphinx/api/qec/trt_decoder_api.rst b/docs/sphinx/api/qec/trt_decoder_api.rst index 590243f9..a0394fb9 100644 --- a/docs/sphinx/api/qec/trt_decoder_api.rst +++ b/docs/sphinx/api/qec/trt_decoder_api.rst @@ -10,6 +10,11 @@ architecture and supports various precision modes (FP16, BF16, INT8, FP8) to balance accuracy and speed. + Neural network-based decoders can be trained to perform syndrome decoding + for specific quantum error correction codes and noise models. The TRT decoder + provides a high-performance inference engine for these models, with automatic + CUDA graph optimization for reduced latency. + Requires a CUDA-capable GPU and TensorRT installation. See the `CUDA-Q GPU Compatibility List `_ @@ -80,6 +85,13 @@ only required to satisfy the decoder interface. You can pass any valid parity check matrix of appropriate dimensions. + .. note:: + **Batch Processing**: The TRT decoder automatically handles batch size + optimization. Models trained with batch_size > 1 will receive + zero-padded inputs when using `decode()` on a single syndrome. When + using `decode_batch()`, provide syndromes in multiples of the model's + batch size for optimal performance. + :param H: Parity check matrix (tensor format). Note: This parameter is not used by the TRT decoder but is required by the decoder interface. :param params: Heterogeneous map of parameters: @@ -116,3 +128,16 @@ engine building (defaults to 1GB = 1073741824 bytes). Larger workspaces may allow TensorRT to explore more optimization strategies. + - `use_cuda_graph` (bool): Enable CUDA graph optimization for improved + performance (defaults to True). CUDA graphs capture inference operations + and replay them with reduced kernel launch overhead, providing ~20% + speedup. The optimization is applied automatically on the first decode + call. Automatically disabled for models with dynamic shapes or + multiple optimization profiles. Set to False to force traditional + execution path. + + - `batch_size` (automatic): The decoder automatically detects the model's + batch size from the first input dimension. For models with batch_size > 1, + the `decode()` method automatically zero-pads single syndromes to fill + the batch. The `decode_batch()` method requires the number of syndromes + to be an integral multiple of the model's batch size. diff --git a/docs/sphinx/examples_rst/qec/realtime_decoding.rst b/docs/sphinx/examples_rst/qec/realtime_decoding.rst index 0c14d180..b1c02de9 100644 --- a/docs/sphinx/examples_rst/qec/realtime_decoding.rst +++ b/docs/sphinx/examples_rst/qec/realtime_decoding.rst @@ -529,6 +529,12 @@ Decoder Selection ^^^^^^^^^^^^^^^^^ The page `CUDA-Q QEC Decoders `_ provides information about which decoders are compatible with real-time decoding. +The TRT decoder (``trt_decoder``) can be configured for real-time decoding by specifying +``trt_decoder_config`` parameters. This is useful for neural network-based +decoders trained for specific codes and noise models. Note that TRT models +must be trained with the appropriate input/output dimensions matching the +syndrome and error spaces. See :ref:`trt_decoder_api_python` for detailed configuration options. + Troubleshooting ---------------