(perf): zero-alloc OnTheFly ND + AutoCoeffs default for scalar oneshot#110
(perf): zero-alloc OnTheFly ND + AutoCoeffs default for scalar oneshot#110
OnTheFly ND + AutoCoeffs default for scalar oneshot#110Conversation
Two coupled performance improvements for the ND interpolation path: 1. Pool-based `_collapse_dims`: intermediate arrays now use `acquire!(pool, ...)` instead of `Array(undef, ...)`, achieving zero allocation after warmup on the OnTheFly interpolant callable path and gradient/hessian calls. 2. AutoCoeffs for ND oneshot (scalar query → OnTheFly): a single scalar query no longer pays the O(2^N × n^N) partial-build cost of PreCompute. Scalar `cubic_interp`/`quadratic_interp`/`interp` oneshot is ~2^N× faster (4× in 2D, 8× in 3D). Key changes: - `_collapse_dims` recursive case is annotated `@with_pool` and uses `acquire!`. Base case (1D scalar return) unchanged. - New `_resolve_coeffs_nd_oneshot` in `coeff_policy.jl` separates oneshot resolution from interpolant-construction resolution, avoiding dispatch ambiguity. Scalar query → OnTheFly, batch → PreCompute (future-proofed for batch threshold crossover). - New `_interp_nd_oneshot_onthefly` in `hetero_oneshot.jl` provides the OnTheFly oneshot entry (domain check → `_collapse_dims`). - `cubic_interp` / `quadratic_interp` ND oneshot defaults changed from `PreCompute()` to `AutoCoeffs()`; scalar path branches to the new OnTheFly entry, batch path continues to PreCompute. - `interp()` oneshot API now accepts `coeffs=AutoCoeffs()` kwarg; all `_interp_nd_oneshot_dispatch` overloads forward it. - `_to_quadratic_bc` helpers in `quadratic_nd_build.jl` convert abstract BCs (ZeroCurvBC, ZeroSlopeBC, PolyFit) to concrete `QuadraticBC` for 1D API compatibility. Uses `0 * sample` for duck-type safety (matches existing `_slope_1d_quadratic!` pattern). Fallback `@noinline` error for unsupported BCs. Tests (`test_nd_oneshot_onthefly.jl`, 23 testsets): - Numerical equivalence OnTheFly ≈ PreCompute: cubic 2D/3D, quadratic 2D+derivatives, hetero (Cubic×Linear, Cubic×Quadratic), derivatives, PeriodicBC (single + both axes), mixed extrap, FillExtrap short-circuit, Float32 data, Hermite family. - AutoCoeffs default behavior: cubic/quadratic/interp scalar and batch equivalence with explicit PreCompute. - Zero-allocation (<= ND_ALLOC_THRESHOLD): OnTheFly oneshot 2D/3D, hetero, interpolant callable, gradient, PeriodicBC, AutoCoeffs default. Full suite: 7967 tests pass.
Two regressions in the ND oneshot AutoCoeffs path flagged by code
review:
P1 — ForwardDiff.Dual regression with AutoCoeffs default
-----
Scalar queries with Dual elements hit OnTheFly through `_collapse_dims`,
whose intermediate buffers are typed from `eltype(data) = Float64` and
fail to store Dual fiber results (`MethodError: Float64(::Dual)`).
Fix: Narrow the scalar OnTheFly resolution in
`_resolve_coeffs_nd_oneshot` from `Tuple{Vararg{Real}}` to
`Tuple{Vararg{AbstractFloat}}`. Since `ForwardDiff.Dual <: Real` but
NOT `<: AbstractFloat`, Dual queries now dispatch to a new fallback
overload that returns `PreCompute()`, which correctly propagates
Dual through the pre-built partials kernel. Zero runtime cost —
resolved at compile time.
P2 — PreCompute + local Hermite crashes in oneshot
-----
The scalar oneshot `coeffs` kwarg path did not apply the existing
`_validate_nd_coeffs` check. A call like
`interp((x,y), data, q; method=(PchipInterp(), AkimaInterp()),
coeffs=PreCompute())` reached `_compute_nd_partials_hetero!` and
died with `_extract_bc(::PchipInterp)` MethodError instead of the
helpful ArgumentError raised by the interpolant-construction path.
Fix: Call `_validate_nd_coeffs(coeffs_resolved, method_tuple)` in
the scalar `interp()` oneshot before routing to OnTheFly/PreCompute
dispatch. Mirrors the interpolant-construction validation in
`hetero_interpolant.jl`.
Tests:
- New `@testset "AD: ForwardDiff.Dual scalar query"`: ForwardDiff
gradient through `cubic_interp`, `quadratic_interp`, and `interp`
with default AutoCoeffs, verifying finite results.
- New `@testset "Validation: PreCompute + local Hermite rejected"`:
`@test_throws ArgumentError` for Pchip×Akima, single-Pchip, and
Cubic×Pchip heterogeneous with PreCompute in the scalar oneshot.
Follow-up to code review: the prior test file covered the happy paths but missed several entry points exercised only through corner cases. Audit of why P1 (ForwardDiff.Dual) and P2 (PreCompute+Hermite oneshot) regressions were not caught revealed: - All existing ForwardDiff tests differentiate w.r.t. `data`, not `query`. `data`-Dual paths work because `Tv = eltype(data)` becomes Dual too; `query`-Dual paths (new in this PR via ND scalar oneshot) had no prior coverage. - The `coeffs` kwarg was entirely absent from ND oneshot before this PR, so `_validate_nd_coeffs` was never wired up for the oneshot surface, and no test exercised it. New coverage: - **Quadratic OnTheFly with all supported BCs**: Left/Right(QuadraticFit), MinCurvFit, ZeroCurvBC, ZeroSlopeBC. Loose rtol=1e-5 because OnTheFly (1D-per-fiber) and PreCompute (tensor-product nodal derivatives) take different numerical paths; same math, different ordering. - **`_to_quadratic_bc` error fallback**: PeriodicBC (not a QuadraticBC) must raise ArgumentError instead of MethodError. - **Hessian + Laplacian with OnTheFly interpolant**: previously only gradient was tested; now all three vector calculus ops covered. - **Trivial methods (Linear/Constant) through scalar `interp()`**: asserts `_all_trivial_methods` path produces numerically identical results to the dedicated `linear_interp` / `constant_interp` APIs. - **3D OnTheFly interpolant callable zero-alloc**: previously only 2D tested; adds 3D Cubic×Linear×Quadratic interpolant eval.
Follow-up investigation into the ~1e-6 divergence between OnTheFly and PreCompute for non-default quadratic BCs (ZeroCurvBC, MinCurvFit, ZeroSlopeBC). The initial "floating-point ordering" explanation was incorrect. Actual root cause: `_get_effective_bc_quadratic` in `quadratic_nd_build.jl` switches the BC to `Right(QuadraticFit())` whenever `p_src > 1` (i.e., when forming a mixed partial from an already-differentiated array). PreCompute's tensor-product kernel uses this "improved" BC for `d²f/dxdy`, while OnTheFly — being a sequential 1D composition via `_collapse_dims` — applies the user's BC uniformly at every 1D step and cannot replicate the switch. Consequences: - When the user's BC matches the data's actual boundary behavior, the two paths produce bit-identical results (verified with f(x,y) = sin(x)*(y-π), whose d²f/dy² = 0 exactly). - When the BC is a poor fit (e.g., ZeroCurvBC on sin*cos), the boundary error propagates through the different effective BCs and the two paths diverge at ~1e-6 relative. Both are still within the expected quadratic interpolation accuracy. - Divergence scales with BC-data mismatch: verified across three test functions with qualitatively different d²f/dy² at the boundary. This is a meaningful design discrepancy, not a bug. Documented: - New `@testset "Quadratic BC-exact data"` proves bit-identical equivalence when the BC is correct for the data. - Test comment in "Quadratic OnTheFly with all supported BCs" now cites the real root cause instead of "floating-point ordering". - `quadratic_interp` ND oneshot docstring gained a strategy selection section + `!!! note` explaining the discrepancy.
OnTheFly's `_collapse_dims` stores query-dependent intermediate results
in the pool buffer (unique among library paths — PreCompute stores
only data-derived partials, with query handled in scalar kernel
arithmetic). Previously the buffer was typed from `eltype(data)`, so
a `ForwardDiff.Dual` query over `Float64` data failed with
`MethodError: no method matching Float64(::Dual)` when the 1D fiber
eval tried to store its Dual result into a Float64 buffer.
The earlier PR's workaround routed scalar Dual queries to PreCompute
via `_resolve_coeffs_nd_oneshot` fallback. That preserved correctness
but silently overrode the user's explicit `coeffs=OnTheFly()` and
regressed performance (Dual scalar paid the 2^N× PreCompute cost).
It also did not fix the interpolant callable path `itp(::Dual, ::Dual)`
on an OnTheFly-backed `HeteroInterpolantND`, which was already broken
pre-PR (no dedicated test previously exercised it).
Proper fix: plumb a result-type parameter `::Type{Tr}` through
`_collapse_dims` and its callers. `Tr` is computed once at each entry
point by promoting the data eltype with the query eltypes:
```julia
Tr = _promote_query_eltype(Tv, q_eval)
```
`_promote_query_eltype` (new in `core/utils.jl`) recursively folds
`promote_type` over the query tuple via `Base.tail`. Unlike a
`@generated` body — which would suffer from world-age issues when
`promote_rule(Float64, Dual)` is defined in an extension module
loaded after FastInterpolations — this recursive formulation lets
Julia's normal inference resolve the promotion at the call site's
world age, so the Dual rule is visible and `Tr` constant-folds to a
concrete type.
Changes:
- `core/utils.jl`: new `_promote_query_eltype(Tv, q::Tuple)` helper.
- `hetero/hetero_eval.jl`: `_collapse_dims` recursive case now takes
`::Type{Tr}` as first arg and uses `acquire!(pool, Tr, ...)`. Base
case (1D) accepts `::Type` to preserve dispatch uniformity but
ignores it (scalar return, no buffer needed). `_eval_hetero_nd`
and `_eval_at_cell` compute `Tr` and forward it.
- `hetero/hetero_oneshot.jl`: `_interp_nd_oneshot_onthefly` computes
and forwards `Tr`.
- `hetero/hetero_nointerp.jl`: `@generated _eval_nointerp` emits the
`Tr` computation and forwards it to `_collapse_dims`.
- `core/coeff_policy.jl`: remove the scalar-Dual → PreCompute
fallback from `_resolve_coeffs_nd_oneshot`. All scalar Real queries
now resolve to OnTheFly when AutoCoeffs is active. Comment updated
to document that `_collapse_dims` handles Dual natively via `Tr`
promotion.
- `test/test_nd_oneshot_onthefly.jl`: `"AD: ForwardDiff.Dual"` testset
extended to cover explicit `coeffs=OnTheFly()` for cubic and
quadratic, heterogeneous oneshot, the previously-broken
`itp((Dual, Dual))` interpolant callable path, and an accuracy
check against the analytical gradient of `sin(x)*cos(y)`.
Verified: zero-alloc preserved for Float64 queries (function-barrier
tests pass at `ND_ALLOC_THRESHOLD = 0` on Julia 1.12+), all 5 new AD
subtests pass, existing hetero_nd / nointerp / test_nd_oneshot_onthefly
test files all pass.
OnTheFly ND + AutoCoeffs default for scalar oneshot
FastInterpolations.jl BenchmarksAll benchmarks (42 total, click to expand)
|
| Benchmark | Current | Previous | Imm. Ratio | Grad. Ratio | Tier |
|---|---|---|---|---|---|
10_nd_construct/bilinear_2d |
719.1 ns |
712.1 ns |
1.01 |
1.162 |
gradual |
10_nd_construct/trilinear_3d |
2027.6 ns |
1951.0 ns |
1.039 |
1.171 |
gradual |
1_cubic_oneshot/q00001 |
502.5 ns |
491.7 ns |
1.022 |
1.104 |
gradual |
2_cubic_construct/g0100 |
1481.0 ns |
1462.2 ns |
1.013 |
1.106 |
gradual |
5_linear_construct/g0100 |
40.6 ns |
34.8 ns |
1.167 |
0.942 |
immediate |
5_linear_construct/g1000 |
278.2 ns |
251.2 ns |
1.108 |
1.04 |
immediate |
Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)
This comment was automatically generated by Benchmark workflow.
There was a problem hiding this comment.
Pull request overview
This PR optimizes ND one-shot interpolation by making scalar queries default to an OnTheFly strategy (avoiding full partial-derivative builds), switching intermediate buffers to pool allocation for zero-allocation hot paths after warmup, and improving AD compatibility by promoting OnTheFly pool buffer element types to match query eltypes (e.g., ForwardDiff.Dual).
Changes:
- Introduces an OnTheFly ND one-shot path via
_collapse_dimsthat uses pool-acquired intermediate arrays and promotes buffer eltypes for AD. - Changes ND one-shot defaults (
cubic_interp,quadratic_interp, andinterpscalar oneshot) to useAutoCoeffs()(scalar → OnTheFly; batch → PreCompute). - Adds a comprehensive new test suite for ND one-shot OnTheFly/AutoCoeffs behavior, AD regressions, and zero-allocation checks.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_nd_oneshot_onthefly.jl | Adds coverage for OnTheFly vs PreCompute equivalence, AutoCoeffs defaults, allocation behavior, BC variants, and ForwardDiff regression cases. |
| test/runtests.jl | Registers the new ND one-shot OnTheFly/AutoCoeffs test file. |
| src/cubic/nd/cubic_nd_oneshot.jl | Changes default coeff strategy to AutoCoeffs() and routes scalar oneshot through OnTheFly when resolved. |
| src/quadratic/nd/quadratic_nd_oneshot.jl | Adds coeffs=AutoCoeffs() to the public API and routes scalar oneshot through OnTheFly with BC normalization. |
| src/quadratic/nd/quadratic_nd_build.jl | Adds _to_quadratic_bc to convert abstract BCs to 1D quadratic-compatible BCs for OnTheFly. |
| src/hetero/hetero_oneshot.jl | Adds a shared OnTheFly scalar oneshot implementation and plumbs coeffs through scalar/batch one-shot dispatch. |
| src/hetero/hetero_eval.jl | Refactors _collapse_dims to be pool-based and to accept a promoted buffer element type for AD safety. |
| src/hetero/hetero_nointerp.jl | Promotes pool buffer eltype for no-interp slicing path to support Dual queries without fallback. |
| src/core/utils.jl | Adds _promote_query_eltype to compute promoted element type via tuple-folding (world-age safe). |
| src/core/coeff_policy.jl | Adds _resolve_coeffs_nd_oneshot policy for ND oneshot strategy resolution (scalar vs batch). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #110 +/- ##
==========================================
- Coverage 96.66% 96.56% -0.10%
==========================================
Files 136 136
Lines 11055 11083 +28
==========================================
+ Hits 10686 10702 +16
- Misses 369 381 +12
🚀 New features to boost your workflow:
|
…olution Address Copilot review on PR #110 with a cleaner architectural fix. Previously, ND batch one-shot with any local Hermite method (`PchipInterp`, `AkimaInterp`, `CardinalInterp`) — homogeneous *or* heterogeneous, with any `coeffs` — failed with cryptic `MethodError: _extract_bc(::PchipInterp)`, because the heterogeneous batch fallback always took the PreCompute partials path even when explicit `coeffs=OnTheFly()` was passed. Architectural fix (single source of truth, no new functions): 1. `_resolve_coeffs_nd_oneshot` is now method-aware via the existing `_has_any_local_method` type-trait fold. AutoCoeffs + batch + local Hermite resolves to `OnTheFly()` (the only working strategy for those methods); all other batch cases keep `PreCompute()` for build amortization. 2. The heterogeneous batch dispatch (`_interp_nd_oneshot_batch_dispatch!`) now branches on the resolved `coeffs`: when `OnTheFly`, it loops the existing scalar `_interp_nd_oneshot_onthefly` per query — reusing the one-shot infrastructure rather than introducing a new batch backend. 3. `interp!` validates the resolved strategy via the existing scalar `_validate_nd_coeffs`, so explicit `PreCompute() + Hermite + batch` still raises a clear `ArgumentError` (mirroring the scalar `interp` path); AutoCoeffs never trips the validation because it auto-routes to OnTheFly. Fully compile-time fold-able for non-Hermite method tuples (concrete `_is_local_method` dispatch), so zero runtime overhead on the supported PreCompute paths. Also fixes a stale `quadratic_interp` docstring that wrongly claimed `AutoCoeffs` falls back to `PreCompute()` for `ForwardDiff.Dual` queries (scalar Dual queries flow through OnTheFly since `Dual <: Real`), and stale internal comments referencing the old `_output_eltype` promotion path. New tests in `test_nd_oneshot_onthefly.jl` verify AutoCoeffs+Hermite+batch matches the scalar one-shot reference for `(Pchip,Pchip)`, `(Akima,Akima)`, `(Cubic,Pchip)`, and `(Pchip,Linear)`, and that explicit `PreCompute() + Hermite + batch` rejects with ArgumentError.
…tency fix
Two cross-validation assertions in `test_autodiff_Zygote.jl` ("eval — L2 loss"
and "gradient — ‖∇f‖² loss" inside the Quadratic ND interpolant API testset)
fail with the new `AutoCoeffs` default introduced in 438d17e because the
seed values are computed via the one-shot API (now `OnTheFly`) while Zygote
internally traces the interpolant constructor (`PreCompute`). For Quadratic
non-default BCs the two paths disagree by ~1e-6 due to the documented
mixed-partial BC switch in `_get_effective_bc_quadratic`.
Replace `atol = 1e-10` with `rtol`:
- L2 loss test: `rtol = 1e-4` (observed min ~1.4e-5, ~7× margin)
- gradient ‖∇f‖² loss test: `rtol = 1e-3` (observed min ~1.6e-4, ~6× margin)
`rtol` semantics ("1 part in N") are more meaningful for these
Frobenius-norm comparisons than the original absolute tolerance — and
unlike `atol`, they remain robust as grid sizes / data magnitudes change.
TODO comments added on both assertions: tighten back to `atol = 1e-10` once
the follow-up PR resolves the Quadratic OnTheFly/PreCompute mixed-partial BC
inconsistency, which will restore bit-equality between the two paths.
…tency fix Follow-up to 8f23a2c — the same cross-validation pattern is asserted in the Enzyme extension tests (`test_autodiff_Enzyme.jl`), and the Julia 1.x CI run on commit 8f23a2c fails the Quadratic subcases of: - `ND struct API — ∂/∂data via Enzyme` › "eval — L2 loss" (line 587) - `ND struct vector calculus — ∂/∂data via Enzyme` › "gradient — ‖∇f‖² loss" (line 749) Root cause is identical to the Zygote case: Enzyme traces the interpolant constructor (`PreCompute`) while the cross-validation seed is computed via the one-shot API (now `OnTheFly` under `AutoCoeffs`), and the two paths disagree by ~1e-6 for Quadratic non-default BCs due to the documented mixed-partial BC switch in `_get_effective_bc_quadratic`. Apply the same `rtol` relaxation used in the Zygote fix: - L2 loss: `rtol = 1e-4` (observed min ~1.4e-5, ~7× margin) - gradient ‖∇f‖² loss: `rtol = 1e-3` (observed min ~1.6e-4, ~6× margin) Both TODO comments cross-reference the matching Zygote test lines so the follow-up PR that fixes the BC consistency can tighten all four assertions in one go.
The ND PreCompute build helpers `_get_effective_bc` (cubic) and `_get_effective_bc_quadratic` previously substituted `CubicFit()` / `Right(QuadraticFit())` for the user BC when building mixed partials (`p_src > 1`). This produced an axis-asymmetric tensor product and a numerical drift vs. the OnTheFly composition path (~1e-10 cubic, ~1e-5 quadratic for non-default BCs). PR #110 documented the discrepancy and relaxed four autodiff test tolerances; this commit removes the substitution at the root. Both helpers now return the user BC for every partial. By linearity and C^1 continuity of the 1D Hermite build, this makes the stored mixed partial exactly Σ_{k,l} φ_k'(x_i) ψ_l'(y_j) f[k,l], which is OnTheFly's composition formula, so PreCompute and OnTheFly become bit-equivalent (modulo a few ULPs of FP reordering noise) for every user BC. The short-grid defensive fallback (cubic: < 4 points, quadratic: < 3 points) is preserved but now emits an informative one-shot `@warn maxlog=1` via a `@noinline` helper before substituting the default. Side effects: - The cubic ND adjoint in `cubic_nd_adjoint.jl` retains its `caches`/`mixed_caches` dual-cache plumbing. After this fix the two are structurally equivalent for the common case (non-periodic, length >= 4); the redundancy is harmless and a follow-up PR can collapse them. An inline comment marks the opportunity. - The quadratic ND oneshot docstring previously carried a `!!! note "Strategy discrepancy"` explaining the ~1e-6 drift. That note is removed and replaced with a plain statement that the two strategies are bit-equivalent, with a pointer to `claudedocs/TODO/00_HIGHEST_PRIORITY_mixed_partial_bc_fix.md` for historical context. PeriodicBC is cubic-only in this codebase, so the quadratic helper does not need a Rule-2-equivalent propagation check.
…d check Downstream verification that the mixed-partial BC consistency fix works end-to-end through the autodiff pipeline. PR #110 temporarily relaxed four assertions in the ND quadratic autodiff tests (two in `test_autodiff_Zygote.jl`, two in `test_autodiff_Enzyme.jl`) because Zygote/Enzyme trace the interpolant constructor (`PreCompute`) while the cross-validation seed comes from the one-shot API (now `OnTheFly` under `AutoCoeffs`), and the two paths disagreed by ~1e-6 for non-default quadratic BCs. With the BC consistency fix, PreCompute and OnTheFly produce bit-equivalent results, so the original `atol = 1.0e-10` is restored at all four sites and the TODO comment blocks are removed: - test/ext/test_autodiff_Zygote.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 - test/ext/test_autodiff_Enzyme.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 Also adds a new testset "AutoCoeffs: quadratic AD seed matches PreCompute exactly" in `test_nd_oneshot_onthefly.jl` that uses `==` (bit-exact equality, no tolerance) to compare OnTheFly and PreCompute for the exact call patterns autodiff uses — value query, ∂/∂x, ∂/∂y. This is a direct regression pin for the autodiff seed consistency and is stronger than the `atol = 50*eps` tolerance used in the new `test_nd_mixed_partial_bc_consistency.jl`. Verified locally: - test_autodiff_Zygote.jl passes (2m 29s) - test_autodiff_Enzyme.jl passes (68 pass, 2 broken pre-existing, 20m 49s) - test_nd_oneshot_onthefly.jl passes (1m 24s)
…ion path Two related bugs in the GridIdx + coeffs interaction shipped with PR #110: 1. Scalar `interp` validated `coeffs` against the *un-promoted* method tuple, so `interp((x, y), data, (qx, GridIdx(k)); method=(CubicInterp(), PchipInterp()), coeffs=PreCompute())` rejected the call even though the GridIdx auto-promotion would have replaced PchipInterp with NoInterp, leaving a 1-D cubic problem that PreCompute fully supports. Fix: run `_promote_grididx_to_nointerp` BEFORE `_validate_nd_coeffs` so the validation sees the methods that actually drive dispatch. 2. Batch `interp!` delegating to `_interp_batch_with_grididx!` did not forward the `coeffs` kwarg, so explicit `coeffs=PreCompute()` / `coeffs=OnTheFly()` were silently dropped on the GridIdx batch path and the reduced sub-problem ran with the default AutoCoeffs fallback. Fix: add a `coeffs` kwarg to `_interp_batch_with_grididx!` and forward it at both delegation sites (early-return for fully-expanded queries and the main reduced-problem `interp!` call). Tests cover: - Scalar: GridIdx-on-Pchip + PreCompute now succeeds and matches the manually-sliced 1-D cubic call. The non-EvalValue deriv path (which disables promotion) still correctly rejects. - Batch: NoInterp axis is GridIdx-sliced → reduced 1-D Cubic problem accepts both PreCompute and OnTheFly. Differentiator: non-NoInterp GridIdx (Pchip remains in the method tuple) now correctly REJECTS `coeffs=PreCompute()` instead of silently dropping it. Both fixes verified in TDD red→green order. Touched suites all green: test_nointerp.jl, test_nd_mixed_partial_bc_consistency.jl, test_nd_oneshot_onthefly.jl, test_hetero_nd.jl, test_hetero_oneshot.jl, test_hetero_precomputed.jl.
The ND PreCompute build helpers `_get_effective_bc` (cubic) and `_get_effective_bc_quadratic` previously substituted `CubicFit()` / `Right(QuadraticFit())` for the user BC when building mixed partials (`p_src > 1`). This produced an axis-asymmetric tensor product and a numerical drift vs. the OnTheFly composition path (~1e-10 cubic, ~1e-5 quadratic for non-default BCs). PR #110 documented the discrepancy and relaxed four autodiff test tolerances; this commit removes the substitution at the root. Both helpers now return the user BC for every partial. By linearity and C^1 continuity of the 1D Hermite build, this makes the stored mixed partial exactly Σ_{k,l} φ_k'(x_i) ψ_l'(y_j) f[k,l], which is OnTheFly's composition formula, so PreCompute and OnTheFly become bit-equivalent (modulo a few ULPs of FP reordering noise) for every user BC. The short-grid defensive fallback (cubic: < 4 points, quadratic: < 3 points) is preserved but now emits an informative one-shot `@warn maxlog=1` via a `@noinline` helper before substituting the default. Side effects: - The cubic ND adjoint in `cubic_nd_adjoint.jl` retains its `caches`/`mixed_caches` dual-cache plumbing. After this fix the two are structurally equivalent for the common case (non-periodic, length >= 4); the redundancy is harmless and a follow-up PR can collapse them. An inline comment marks the opportunity. - The quadratic ND oneshot docstring previously carried a `!!! note "Strategy discrepancy"` explaining the ~1e-6 drift. That note is removed and replaced with a plain statement that the two strategies are bit-equivalent, with a pointer to `claudedocs/TODO/00_HIGHEST_PRIORITY_mixed_partial_bc_fix.md` for historical context. PeriodicBC is cubic-only in this codebase, so the quadratic helper does not need a Rule-2-equivalent propagation check.
…d check Downstream verification that the mixed-partial BC consistency fix works end-to-end through the autodiff pipeline. PR #110 temporarily relaxed four assertions in the ND quadratic autodiff tests (two in `test_autodiff_Zygote.jl`, two in `test_autodiff_Enzyme.jl`) because Zygote/Enzyme trace the interpolant constructor (`PreCompute`) while the cross-validation seed comes from the one-shot API (now `OnTheFly` under `AutoCoeffs`), and the two paths disagreed by ~1e-6 for non-default quadratic BCs. With the BC consistency fix, PreCompute and OnTheFly produce bit-equivalent results, so the original `atol = 1.0e-10` is restored at all four sites and the TODO comment blocks are removed: - test/ext/test_autodiff_Zygote.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 - test/ext/test_autodiff_Enzyme.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 Also adds a new testset "AutoCoeffs: quadratic AD seed matches PreCompute exactly" in `test_nd_oneshot_onthefly.jl` that uses `==` (bit-exact equality, no tolerance) to compare OnTheFly and PreCompute for the exact call patterns autodiff uses — value query, ∂/∂x, ∂/∂y. This is a direct regression pin for the autodiff seed consistency and is stronger than the `atol = 50*eps` tolerance used in the new `test_nd_mixed_partial_bc_consistency.jl`. Verified locally: - test_autodiff_Zygote.jl passes (2m 29s) - test_autodiff_Enzyme.jl passes (68 pass, 2 broken pre-existing, 20m 49s) - test_nd_oneshot_onthefly.jl passes (1m 24s)
…ion path Two related bugs in the GridIdx + coeffs interaction shipped with PR #110: 1. Scalar `interp` validated `coeffs` against the *un-promoted* method tuple, so `interp((x, y), data, (qx, GridIdx(k)); method=(CubicInterp(), PchipInterp()), coeffs=PreCompute())` rejected the call even though the GridIdx auto-promotion would have replaced PchipInterp with NoInterp, leaving a 1-D cubic problem that PreCompute fully supports. Fix: run `_promote_grididx_to_nointerp` BEFORE `_validate_nd_coeffs` so the validation sees the methods that actually drive dispatch. 2. Batch `interp!` delegating to `_interp_batch_with_grididx!` did not forward the `coeffs` kwarg, so explicit `coeffs=PreCompute()` / `coeffs=OnTheFly()` were silently dropped on the GridIdx batch path and the reduced sub-problem ran with the default AutoCoeffs fallback. Fix: add a `coeffs` kwarg to `_interp_batch_with_grididx!` and forward it at both delegation sites (early-return for fully-expanded queries and the main reduced-problem `interp!` call). Tests cover: - Scalar: GridIdx-on-Pchip + PreCompute now succeeds and matches the manually-sliced 1-D cubic call. The non-EvalValue deriv path (which disables promotion) still correctly rejects. - Batch: NoInterp axis is GridIdx-sliced → reduced 1-D Cubic problem accepts both PreCompute and OnTheFly. Differentiator: non-NoInterp GridIdx (Pchip remains in the method tuple) now correctly REJECTS `coeffs=PreCompute()` instead of silently dropping it. Both fixes verified in TDD red→green order. Touched suites all green: test_nointerp.jl, test_nd_mixed_partial_bc_consistency.jl, test_nd_oneshot_onthefly.jl, test_hetero_nd.jl, test_hetero_oneshot.jl, test_hetero_precomputed.jl.
) * (fix): apply user BC to ND mixed partials (restores Clairaut symmetry) The ND PreCompute build helpers `_get_effective_bc` (cubic) and `_get_effective_bc_quadratic` previously substituted `CubicFit()` / `Right(QuadraticFit())` for the user BC when building mixed partials (`p_src > 1`). This produced an axis-asymmetric tensor product and a numerical drift vs. the OnTheFly composition path (~1e-10 cubic, ~1e-5 quadratic for non-default BCs). PR #110 documented the discrepancy and relaxed four autodiff test tolerances; this commit removes the substitution at the root. Both helpers now return the user BC for every partial. By linearity and C^1 continuity of the 1D Hermite build, this makes the stored mixed partial exactly Σ_{k,l} φ_k'(x_i) ψ_l'(y_j) f[k,l], which is OnTheFly's composition formula, so PreCompute and OnTheFly become bit-equivalent (modulo a few ULPs of FP reordering noise) for every user BC. The short-grid defensive fallback (cubic: < 4 points, quadratic: < 3 points) is preserved but now emits an informative one-shot `@warn maxlog=1` via a `@noinline` helper before substituting the default. Side effects: - The cubic ND adjoint in `cubic_nd_adjoint.jl` retains its `caches`/`mixed_caches` dual-cache plumbing. After this fix the two are structurally equivalent for the common case (non-periodic, length >= 4); the redundancy is harmless and a follow-up PR can collapse them. An inline comment marks the opportunity. - The quadratic ND oneshot docstring previously carried a `!!! note "Strategy discrepancy"` explaining the ~1e-6 drift. That note is removed and replaced with a plain statement that the two strategies are bit-equivalent, with a pointer to `claudedocs/TODO/00_HIGHEST_PRIORITY_mixed_partial_bc_fix.md` for historical context. PeriodicBC is cubic-only in this codebase, so the quadratic helper does not need a Rule-2-equivalent propagation check. * (test): add ND mixed-partial BC consistency regression suite New dedicated test file `test_nd_mixed_partial_bc_consistency.jl` pins the mathematical invariants that the BC fix in the preceding commit restores. Authored as the RED phase of a TDD cycle: on the pre-fix commit the math-invariant groups fail by the bug magnitudes documented in the design doc (~1e-10 cubic, ~1e-5 quadratic); after the fix they pass to `atol = 50*eps(Float64)` (~1.1e-14), which is generous enough to absorb tridiagonal / recurrence FP reordering but orders of magnitude tighter than any bug signature. Five test groups: A. PreCompute ↔ OnTheFly bit-equivalence across BCs (ZeroCurvBC, ZeroSlopeBC, CubicFit, MinCurvFit, Right(QuadraticFit)) and data functions (sin*cos, sin*y^3, sin*(y-π)) in 2D and 3D for both cubic and quadratic. B. Clairaut / axis-swap symmetry: building on `(xs, ys)` with `data` vs `(ys, xs)` with `permutedims(data)` and querying at swapped coordinates must give the same value. This is the strongest regression guard: the pre-fix PreCompute violated this by up to ~1e-5 for quadratic ZeroSlopeBC and ~1e-9 for cubic. C. Hessian off-diagonal symmetry (H[1,2] ≈ H[2,1]) for PreCompute interpolants with non-default BCs in both cubic and quadratic. Note: this assertion happens to pass on the buggy code because a single build is internally symmetric; we keep it as a forward regression guard. D. Cubic periodic-on-one-axis propagation regression-pin. Rule 2 in `_get_effective_bc` already handles this; the test asserts it continues to work after the fix. (Quadratic does not support PeriodicBC in this codebase.) E. Short-grid fallback warnings. Tests the helpers directly via `FastInterpolations._get_effective_bc` / `_get_effective_bc_quadratic` on very short grids (cubic: 3 points, quadratic: 2 points) with non-PolyFit user BCs. Asserts the `@warn ... maxlog=1` fires and the fallback BC is returned (`ZeroCurvBC()` / `MinCurvFit()`). Uses `@test_logs` for warning capture. Registered in `test/runtests.jl` next to `test_nd_oneshot_onthefly.jl`. * (test): restore tightened atol=1e-10 for ND AD + add bit-exact AD seed check Downstream verification that the mixed-partial BC consistency fix works end-to-end through the autodiff pipeline. PR #110 temporarily relaxed four assertions in the ND quadratic autodiff tests (two in `test_autodiff_Zygote.jl`, two in `test_autodiff_Enzyme.jl`) because Zygote/Enzyme trace the interpolant constructor (`PreCompute`) while the cross-validation seed comes from the one-shot API (now `OnTheFly` under `AutoCoeffs`), and the two paths disagreed by ~1e-6 for non-default quadratic BCs. With the BC consistency fix, PreCompute and OnTheFly produce bit-equivalent results, so the original `atol = 1.0e-10` is restored at all four sites and the TODO comment blocks are removed: - test/ext/test_autodiff_Zygote.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 - test/ext/test_autodiff_Enzyme.jl - "Quadratic eval — L2 loss" : rtol=1e-4 → atol=1e-10 - "Quadratic gradient — ‖∇f‖² loss" : rtol=1e-3 → atol=1e-10 Also adds a new testset "AutoCoeffs: quadratic AD seed matches PreCompute exactly" in `test_nd_oneshot_onthefly.jl` that uses `==` (bit-exact equality, no tolerance) to compare OnTheFly and PreCompute for the exact call patterns autodiff uses — value query, ∂/∂x, ∂/∂y. This is a direct regression pin for the autodiff seed consistency and is stronger than the `atol = 50*eps` tolerance used in the new `test_nd_mixed_partial_bc_consistency.jl`. Verified locally: - test_autodiff_Zygote.jl passes (2m 29s) - test_autodiff_Enzyme.jl passes (68 pass, 2 broken pre-existing, 20m 49s) - test_nd_oneshot_onthefly.jl passes (1m 24s) * (test): update characterization tests to match post-fix BC contract The full-suite run on the branch uncovered one hard failure and several pieces of documentation drift that encoded the pre-fix buggy behavior. All are fixed in this commit. **test/test_nd_coverage.jl** - `_get_effective_bc edge cases` (cubic, line 141): the test asserted `_get_effective_bc(ZeroCurvBC(), 2, grid_long) isa CubicFit` which was a characterization test on the now-removed substitution. Updated to assert `isa ZeroCurvBC` (the user BC is returned). - Cubic short-grid edge case (line 138): the fallback branch now emits an informative `@warn maxlog=1`. Wrapped the call in `@test_logs (:warn, r"Cubic ND mixed-partial build")` to assert the warning and suppress it from test output. - `_get_effective_bc_quadratic edge cases` (line 352): symmetric update — the quadratic short-grid branch also now emits `@warn maxlog=1`. Wrapped the call in `@test_logs (:warn, r"Quadratic ND mixed-partial build")`. **test/test_nd_oneshot_onthefly.jl** - "Quadratic OnTheFly with all supported BCs" (line 455): the long NOTE describing the ~1e-6 divergence as expected behavior was rewritten to describe the post-fix reality (PreCompute ≡ OnTheFly bit-equivalent for every user BC). The assertion tolerance is tightened from `rtol = 1e-5` to `atol = 50 * eps(Float64)` (~1.1e-14), matching the new regression suite. - "Quadratic BC-exact data" (line 479): the explanatory comment was rewritten as a historical note. The test is retained as a coarse smoke check; bit-identity now holds for arbitrary smooth data, not only BC-exact data, so the original motivation no longer applies. Verified locally: - test_nd_coverage.jl passes (30s) - test_nd_oneshot_onthefly.jl passes (1m 19s) * (test): add direct branch-coverage tests for BC consistency helpers Adds a new test group F to `test_nd_mixed_partial_bc_consistency.jl` that directly invokes `_get_effective_bc` and `_get_effective_bc_quadratic` with inputs hitting every distinct return statement in both helpers. Groups A–C cover the helpers indirectly through the full ND build pipeline, but a few branches (e.g., `get_polyfit_degree(bc) > 0` short-circuit; `PeriodicBC` Rule 2 early return) are not trivially reached that way. Group F exists so `src/cubic/nd/cubic_nd_build.jl` and `src/quadratic/nd/ quadratic_nd_build.jl` land at ≥ 95% line coverage for the patch. Also consolidates group E's two `@test_logs` calls into single `match_mode = :any` blocks (warning assertion and return-value assertion together) so the short-grid fallback emits exactly one captured log per helper, with no stray stderr output from a second call outside `@test_logs`. Branches covered by group F: - Cubic `_get_effective_bc`: - Rule 1 (p_src == 1) for three BC types - Rule 2 (periodic) for long and short grids - Rule 3 left branch (`get_polyfit_degree > 0`) for both lengths - Rule 3 right branch (`length >= 4`) for ZeroSlope/ZeroCurv - Quadratic `_get_effective_bc_quadratic`: - Rule 1 for Right(QuadraticFit), MinCurvFit, ZeroCurv - Rule 2 (length >= 3) for three BC types The short-grid fallback path for both helpers is still covered by group E's `@test_logs` blocks — those calls execute every line of both the helper body and the `@noinline` warning helper. * test(nd-bc-consistency): close line-coverage gap in quadratic helper - Restructure `_get_effective_bc_quadratic` to use explicit `if/end` for the length-≥3 branch, mirroring cubic's `_get_effective_bc` control flow so Julia coverage attribution lands on every executable line. - Add direct short-grid invocations to group F (cubic + quadratic). The group E `@test_logs` macro swallows line counters for the wrapped helper body (a known Julia coverage attribution quirk); the unwrapped direct calls give ground-truth line attribution. `maxlog=1` keeps the output silent since group E already fired the warning. Patch line coverage on this branch: 77/77 = 100.0%. * docs+test: address review feedback on BC consistency fix Source: - quadratic_nd_oneshot.jl: docstring lied about AutoCoeffs routing for scalar quadratic queries — said "OnTheFly" but the implementation unconditionally routes AutoCoeffs to PreCompute (intentional, for bit-exact AD seed agreement with the interpolant constructor). Updated the docstring to reflect actual behavior and explain why cubic differs. Also updated the design-doc reference to its archived location. Tests: - Move testset D (cubic periodic propagation) to its declared position between C and E, matching the lettered header comment. - Add a second 3D axis-swap permutation in group B (axes 1↔2 in addition to the existing 1↔3) for both cubic and quadratic. - Add a `!iszero` Hessian-diagonal sanity assertion to group C so a hypothetical bug zeroing the entire matrix would not silently pass. - Clarify the explanatory comment in group F: the second short-grid call is silent because there is no `@test_logs` wrapper, not because `maxlog=1` is exhausted (Test.jl bypasses the maxlog counter). All targeted tests still green: - test_nd_mixed_partial_bc_consistency.jl - test_nd_oneshot_onthefly.jl * fix(grididx): honor explicit coeffs kwarg through GridIdx auto-promotion path Two related bugs in the GridIdx + coeffs interaction shipped with PR #110: 1. Scalar `interp` validated `coeffs` against the *un-promoted* method tuple, so `interp((x, y), data, (qx, GridIdx(k)); method=(CubicInterp(), PchipInterp()), coeffs=PreCompute())` rejected the call even though the GridIdx auto-promotion would have replaced PchipInterp with NoInterp, leaving a 1-D cubic problem that PreCompute fully supports. Fix: run `_promote_grididx_to_nointerp` BEFORE `_validate_nd_coeffs` so the validation sees the methods that actually drive dispatch. 2. Batch `interp!` delegating to `_interp_batch_with_grididx!` did not forward the `coeffs` kwarg, so explicit `coeffs=PreCompute()` / `coeffs=OnTheFly()` were silently dropped on the GridIdx batch path and the reduced sub-problem ran with the default AutoCoeffs fallback. Fix: add a `coeffs` kwarg to `_interp_batch_with_grididx!` and forward it at both delegation sites (early-return for fully-expanded queries and the main reduced-problem `interp!` call). Tests cover: - Scalar: GridIdx-on-Pchip + PreCompute now succeeds and matches the manually-sliced 1-D cubic call. The non-EvalValue deriv path (which disables promotion) still correctly rejects. - Batch: NoInterp axis is GridIdx-sliced → reduced 1-D Cubic problem accepts both PreCompute and OnTheFly. Differentiator: non-NoInterp GridIdx (Pchip remains in the method tuple) now correctly REJECTS `coeffs=PreCompute()` instead of silently dropping it. Both fixes verified in TDD red→green order. Touched suites all green: test_nointerp.jl, test_nd_mixed_partial_bc_consistency.jl, test_nd_oneshot_onthefly.jl, test_hetero_nd.jl, test_hetero_oneshot.jl, test_hetero_precomputed.jl. * docs: reword "bit-equivalent" → "agreement to ~ULP" per Copilot review Copilot review on PR #111 flagged that docstrings/comments described PreCompute ↔ OnTheFly as "bit-equivalent" / "bit-identical" while the actual assertions use `isapprox(...; atol=50*eps(Float64))`. The wording overstates the guarantee — the residual difference is FP reordering noise in the tridiagonal/recurrence solver, ~50 ULPs at most, not zero. Reworded all four sites: - src/quadratic/nd/quadratic_nd_build.jl:233 — "bit-equivalence" → "numerical equivalence within FP noise" - src/cubic/nd/cubic_nd_build.jl:358 — short-grid warning text now says "no longer match the OnTheFly composition" instead of "not bit-equivalent" - test/test_nd_mixed_partial_bc_consistency.jl header + group-A label — "bit-equivalence" → "agreement to ~ULP / machine epsilon" - test/test_nd_oneshot_onthefly.jl:457 — "bit-equivalent results" → "agree to ~ULP tolerance" The single legitimate "bit-identical" usage at test_nd_oneshot_onthefly.jl lines 469-481 is preserved because that testset uses `==` (true bit equality) on data with `d²f/dy² = 0` everywhere, where both paths agree exactly. The "**not** bit-identical" wording in quadratic_nd_oneshot.jl:131 is also preserved — it correctly negates the property in the context of explaining why quadratic AutoCoeffs forces PreCompute. No code logic changed; comment-only commit.
* (feat): ND forwarders for pchip/cardinal/akima — unified `interp` entry Add ND-shape methods to `pchip_interp`/`cardinal_interp`/`akima_interp` (and their in-place `!` variants) that forward to `interp(grids, data; method=...)`. Covers all four ND call shapes: interpolant construction, scalar one-shot, batch one-shot, and in-place batch. Lets users keep the per-method entry point they already use in 1D instead of rewriting to `interp(...; method=PchipInterp())` by hand. `cardinal_interp` forwarders thread `tension` into `CardinalInterp(tension)`. `CubicHermiteInterp` is intentionally not forwarded — user-supplied per-axis slope arrays need a separate ND design. * test: ND forwarder coverage for pchip/cardinal/akima Smoke tests for the new `pchip_interp`/`cardinal_interp`/`akima_interp` ND forwarders. Verifies ULP-exact equivalence with the corresponding `interp(grids, data; method = ...)` calls across all four ND call shapes (build, scalar oneshot, batch oneshot, in-place batch) for 2D and 3D fixtures. Coverage matrix: - Equivalence (`===` / `==`) vs direct `interp` for every call shape - `@inferred` type stability on the scalar oneshot path - Cardinal `tension` kwarg passthrough (default vs 0.5, plus cross-check against `CardinalInterp(0.5)`) - Grid flavors: all-Vector, all-range, range×Vector, Vector×range — forwarder must preserve concrete grid tuple type into `interp` * Runic formatting * docs: address Copilot review on PR #114 — fix doc/code drift Two doc-only fixes flagged by Copilot review: 1. `src/hetero/local_hermite_nd_forward.jl`: header said "all three ND call shapes" but enumerated four (off-by-one). 2. `test/test_local_hermite_nd_forward.jl`: header claimed `===` for all four call shapes, but batch/array paths use `==` (array identity is meaningless). Clarified that scalar paths use `===`, array paths `==`. No code behavior change.
Summary
_collapse_dims: OnTheFly interpolant callable path +gradient/hessianbecome zero-alloc after warmup (intermediate arrays now viaacquire!instead ofArray(undef, ...)).cubic_interp/quadratic_interp/interp((x,y), data, q)now resolves toOnTheFly()— skips the partials build for a single point. Batch queries keepPreCompute.ForwardDiff.Dualsupport in OnTheFly:_collapse_dimsnow promotes its pool buffer type via_promote_query_eltype(Tv, q_eval), so Dual queries flow through the OnTheFly path directly (no PreCompute fallback, no performance penalty). This also fixes a pre-existing bug:itp((::Dual, ::Dual))on an OnTheFlyHeteroInterpolantNDpreviously crashed withMethodError: Float64(::Dual).Benchmark (scalar oneshot, median time)
A scalar one-shot evaluation never reuses any nodal derivative — there is no
amortization to gain from precomputing the full
2^N × n^Npartials array,so OnTheFly is fundamentally the right strategy for this access pattern.
The numbers below quantify the saved build cost.
cubic_interp2Dcubic_interp2Dcubic_interp3Dcubic_interp3Dquadratic_interp2DBoth paths are zero-alloc after warmup. Speedup grows with grid size (fixed-cost search/cell-location overhead amortizes) and dimensionality (partials build scales as
2^N × n^N).New Behavior
AutoCoeffs()is now the defaultcoeffskwarg incubic_interp,quadratic_interp, andinterpND oneshot APIs (wasPreCompute()).