Conversation
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
b0330dc to
156fd04
Compare
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Greptile SummaryThis PR adds static metadata inference (ndim, dtype, layout) to
Confidence Score: 4/5Safe to merge after fixing the join.cc negative-axis logic bug and the DataNode.str label errors. Two P1 defects are present: the negative-axis normalization in join.cc silently produces wrong layout metadata for any negative axis value, and DataNode.str mislabels dtype and layout as ndim making debug output unreliable. The remaining findings are P2 (dead code, typos, stale docstrings). The core schema/spec/graph infrastructure is well-structured and the validation logic is correct. dali/operators/generic/join.cc (negative axis bug) and dali/python/nvidia/dali/data_node.py (wrong str labels and typos) Important Files Changed
Sequence DiagramsequenceDiagram
participant PY as Python ops/__init__.py
participant DN as DataNode
participant OS as OpSpec
participant SC as OpSchema
participant GM as graph/node_meta.cc
participant OP as OperatorBase
PY->>DN: DataNode(name, device, source, index)
DN->>OS: spec.OutputDesc(index)
OS-->>DN: (name, device, ndim, dtype, layout)
PY->>OS: AddInput(name, device, ndim, dtype, layout)
PY->>OS: AddOutput(name, device)
Note over GM: Pipeline::Build()
GM->>GM: PropagateDataNodeMetadata (DFS)
GM->>OS: MutableInputDesc(i) ← producer OutputDesc
GM->>OS: InferOutputMetadata()
OS->>SC: CalculateOutputDType/NDim/Layout(i, spec)
SC-->>OS: optional<dtype/ndim/layout>
Note over OP: Execution
OP->>OP: Setup(ws, validate_metadata=true)
OP->>OP: ValidateInputMetadata(ws, spec)
OP->>OP: RunImpl(ws)
OP->>OP: ValidateOutputMetadata(ws, spec)
Reviews (1): Last reviewed commit: "TODO(michalz): Fix exception tests." | Re-trigger Greptile |
| if (!desc.layout || desc.layout->empty()) | ||
| continue; | ||
| if (axis < 0) | ||
| axis = desc.layout->ndim() - axis; |
There was a problem hiding this comment.
Wrong negative-axis normalization sign
The subtraction ndim - axis produces the wrong index for every negative axis. For axis = -1 with ndim = 2, the result is 2 - (-1) = 3, which immediately fails the range check on the next line and silently returns nullopt — so OutputLayout for any join with a negative axis is always nullopt. Standard normalization is ndim + axis.
| axis = desc.layout->ndim() - axis; | |
| axis = desc.layout->ndim() + axis; |
| if self.dtype is not None: | ||
| s += f", ndim={self.dtype}" | ||
| if self.layout is not None: | ||
| s += f", ndim={repr(self.layout)}" |
There was a problem hiding this comment.
Wrong field labels in
__str__ for dtype and layout
Both dtype and layout are formatted with the prefix ndim=, making the repr actively misleading for debugging. The labels should match the field names.
| if self.dtype is not None: | |
| s += f", ndim={self.dtype}" | |
| if self.layout is not None: | |
| s += f", ndim={repr(self.layout)}" | |
| if self.ndim is not None: | |
| s += f", ndim={self.ndim}" | |
| if self.dtype is not None: | |
| s += f", dtype={self.dtype}" | |
| if self.layout is not None: | |
| s += f", layout={repr(self.layout)}" |
| const decltype(output_dtype_fn_) &output_dtype_fn = [&]() { | ||
| if (!output_dtype_fn_.empty()) | ||
| return output_dtype_fn_; | ||
| for (auto *parent : GetParents()) | ||
| if (!parent->output_dtype_fn_.empty()) | ||
| return parent->output_dtype_fn_; | ||
| return output_dtype_fn_; | ||
| }(); |
There was a problem hiding this comment.
Dead code —
output_dtype_fn variable is never used
The local variable output_dtype_fn is computed by the immediately-invoked lambda but is never referenced again; the rest of the function delegates to OutputDTypeFn() / OutputDTypeFuncs() which already handle inheritance. This block can be removed without changing behaviour, and the compiler may warn about an unused variable.
| raise ValueError("Msmatch between OpSpec and explicit `ndim` argument.") | ||
| if dtype is not None and self.dtype is not None and dtype != self.dtype: | ||
| raise ValueError("Msmatch between OpSpec and explicit `dtype` argument.") | ||
| if layout is not None and self.layout is not None and layout != self.layout: | ||
| raise ValueError("Msmatch between OpSpec and explicit `layout` argument.") |
There was a problem hiding this comment.
Typo "Msmatch" in error messages
All three ValueError messages say "Msmatch" instead of "Mismatch".
| raise ValueError("Msmatch between OpSpec and explicit `ndim` argument.") | |
| if dtype is not None and self.dtype is not None and dtype != self.dtype: | |
| raise ValueError("Msmatch between OpSpec and explicit `dtype` argument.") | |
| if layout is not None and self.layout is not None and layout != self.layout: | |
| raise ValueError("Msmatch between OpSpec and explicit `layout` argument.") | |
| if ndim is not None and self.ndim is not None and ndim != self.ndim: | |
| raise ValueError("Mismatch between OpSpec and explicit `ndim` argument.") | |
| if dtype is not None and self.dtype is not None and dtype != self.dtype: | |
| raise ValueError("Mismatch between OpSpec and explicit `dtype` argument.") | |
| if layout is not None and self.layout is not None and layout != self.layout: | |
| raise ValueError("Mismatch between OpSpec and explicit `layout` argument.") |
| const OpSpec::InOutDesc &against, | ||
| std::string_view category, | ||
| NameType &&index_or_name) { | ||
| if (what.num_samples() == 0) // empty batch may have improper ndim/layuot, but we don't care |
| /** Gets the function that computes the output dtype for the given output. | ||
| * | ||
| * The returned function may be inherited from a parent schema. | ||
| */ | ||
| OutputNDimFunc OutputNDimFn(int index) const; | ||
|
|
||
| /** Gets the function that computes the output dtype for the given output. | ||
| * | ||
| * The returned function may be inherited from a parent schema. | ||
| */ | ||
| OutputLayoutFunc OutputLayoutFn(int index) const; |
There was a problem hiding this comment.
Copy-pasted docstrings for
OutputNDimFn and OutputLayoutFn
Both functions are documented as "Gets the function that computes the output dtype for the given output", which is the docstring for OutputDTypeFn. The descriptions for OutputNDimFn and OutputLayoutFn should reference ndim and layout respectively.
Co-authored-by: Rostan Tabet rtabet@nvidia.com
Category:
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change
Description:
This change adds static metadata inference (ndim, layout, dtype) to OpSchema. Most operators can infer it from OpSpec.
OpSpec now carries the statically inferred metadata.
Actual inputs and outputs, as seen in the workspace, are now automatically validated against OpSpec in OperatorBase.
Breaking change - since there's a default way to handle metadata inference, custom operators may become broken and need user attention. This is not ideal - some way to handle it would be nice (e.g. make DALI_SCHEMA behave differently when compiling libdali_operators).
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A