- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.8k
 
refactor(libstore): add BGL-based dependency graph for path analysis #14392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d043aaf    to
    88dda19      
    Compare
  
    1d0fd16    to
    bddc239      
    Compare
  
    | 
           @coderabbitai review  | 
    
          
✅ Actions performedReview triggered. 
  | 
    
          
WalkthroughAdds a generic templated DependencyGraph (header + implementation), explicit template instantiations, build integration, and a Google Test suite exercising graph operations, traversal, and edge/property behaviors. Changes
 Sequence Diagram(s)sequenceDiagram
    participant Caller
    participant Graph
    participant Dijkstra
    participant DFS
    Caller->>Graph: dfsFromTarget(start, target, visitNode, visitEdge, shouldStop)
    activate Graph
    Graph->>Dijkstra: computeDistancesFrom(target)  %% compute distances on reversed graph
    activate Dijkstra
    Dijkstra-->>Graph: distances cached
    deactivate Dijkstra
    Graph->>DFS: start distance-ordered DFS from start
    activate DFS
    loop for each visited node
        DFS->>Graph: getSuccessors(node)
        DFS->>Caller: visitNode(node)
        loop for each successor (ordered by distance)
            DFS->>Caller: visitEdge(from,to,is_last)
            alt shouldStop == true
                DFS->>DFS: stop traversal
            end
        end
    end
    DFS-->>Graph: traversal complete
    deactivate DFS
    Graph-->>Caller: return
    deactivate Graph
    Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 
 
 Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Comment   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (19)
src/libstore/meson.build (1)
280-280: Align instantiation strategy with header exposure.You’re compiling explicit instantiations in dependency-graph.cc while also installing the -impl.hh publicly. This risks duplicate instantiations across TUs that include the impl. Recommend: add extern template declarations in dependency-graph.hh and have consumers include only the interface header; keep -impl.hh private. See related comments in headers/tests.
src/libstore/include/nix/store/meson.build (1)
34-35: Don’t install the impl header as public.Exporting dependency-graph-impl.hh invites downstream re-instantiation and locks in implementation details. Prefer:
- Keep only dependency-graph.hh public.
 - Make dependency-graph-impl.hh private (not installed).
 - Pair with extern template declarations in dependency-graph.hh and explicit instantiations in dependency-graph.cc.
 src/libstore/dependency-graph.cc (1)
1-10: Include the interface header first and ensure complete types for instantiation.Add dependency-graph.hh before impl and include to guarantee std::string is complete during explicit instantiation.
-#include "nix/store/dependency-graph-impl.hh" +#include "nix/store/dependency-graph.hh" +#include "nix/store/dependency-graph-impl.hh" +#include <string>Also pair these with extern template declarations in dependency-graph.hh to prevent duplicate instantiations in consumers that include the impl. See header comment.
src/libstore-tests/dependency-graph.cc (4)
1-4: Prefer the public interface header and include algorithms.Include dependency-graph.hh (not -impl.hh) and add for ranges::contains.
-#include "nix/store/dependency-graph-impl.hh" +#include "nix/store/dependency-graph.hh" #include <gtest/gtest.h> +#include <algorithm> // std::ranges::containsIf you keep explicit instantiations in the library, avoid including -impl.hh in tests to prevent redundant template instantiation.
27-28: Fix DFS comment to match behavior.Comment says “B and C before recursing”, but the expectations allow immediate recursion (A→B→D). Update the comment to reflect “successors are considered in distance order; traversal remains DFS”.
- // Successors should be visited in distance order (B and C before recursing) + // Successors are considered in distance order; traversal remains DFS (may recurse into B or C first)
49-53: Strengthen assertions for traversal.Guard against out-of-bounds and assert that D is eventually visited.
- EXPECT_EQ(visitedNodes[0], "a"); - // B and C both at distance 1, could be in either order - EXPECT_TRUE( - (visitedNodes[1] == "b" && visitedNodes[2] == "d") || (visitedNodes[1] == "c" && visitedNodes[2] == "d")); + ASSERT_GE(visitedNodes.size(), 3u); + EXPECT_EQ(visitedNodes[0], "a"); + // First child is either b or c, followed by d due to DFS. + EXPECT_TRUE( + (visitedNodes[1] == "b" && visitedNodes[2] == "d") || + (visitedNodes[1] == "c" && visitedNodes[2] == "d")); + EXPECT_TRUE(std::ranges::contains(visitedNodes, "d"));
34-47: Optionally assert visitedEdges to lock traversal semantics.You already collect visitedEdges; add minimal checks (contains (a,b) or (a,c) and an edge to d).
std::vector<std::pair<std::string, std::string>> visitedEdges; @@ - [&](const std::string & from, const std::string & to, bool isLast, size_t depth) { + [&](const std::string & from, const std::string & to, bool /*isLast*/, size_t /*depth*/) { visitedEdges.emplace_back(from, to); }, @@ + EXPECT_FALSE(visitedEdges.empty()); + EXPECT_TRUE(std::ranges::contains(visitedEdges, std::make_pair(std::string("a"), std::string("b"))) || + std::ranges::contains(visitedEdges, std::make_pair(std::string("a"), std::string("c")))); + EXPECT_TRUE(std::ranges::contains(visitedEdges, std::make_pair(std::string("b"), std::string("d"))) || + std::ranges::contains(visitedEdges, std::make_pair(std::string("c"), std::string("d"))));src/libstore/include/nix/store/dependency-graph.hh (6)
82-89: Document/guarantee edge merge semantics.addEdge(from,to,prop) says “Merges property if edge exists.” Please specify merge semantics (e.g., FileListEdgeProperty: stable, deduped union with deterministic order) and ensure strong exception safety.
119-126: Consider [[nodiscard]] on traversal to prevent ignored calls.dfsFromTarget returns void, fine; but for getters below, [[nodiscard]] avoids accidental ignore. See next comment.
130-143: Mark query methods [[nodiscard]] and clarify ordering.getSuccessors/getAllNodes return vectors; consider [[nodiscard]] and document ordering (insertion, NodeId order, or distance). Determinism helps tests and users.
- std::vector<NodeId> getSuccessors(const NodeId & node) const; + [[nodiscard]] std::vector<NodeId> getSuccessors(const NodeId & node) const; @@ - std::vector<NodeId> getAllNodes() const; + [[nodiscard]] std::vector<NodeId> getAllNodes() const; @@ - size_t numVertices() const + [[nodiscard]] size_t numVertices() const
135-137: Avoid copying heavy edge properties.Returning std::optional by value can be expensive (e.g., FileListEdgeProperty). Consider returning a pointer or reference-like view:
- std::optional<EdgeProperty> getEdgeProperty(const NodeId & from, const NodeId & to) const + const EdgeProperty * getEdgePropertyPtr(const NodeId & from, const NodeId & to) const requires(!std::same_as<EdgeProperty, boost::no_property>);Keeps ABI simple and avoids copies.
149-152: Define deterministic ordering for files on an edge.For FileListEdgeProperty::files, specify invariant (sorted, unique) to make merges deterministic and comparisons cheap.
155-158: Add extern template declarations to match explicit instantiations.Prevents duplicate instantiations in consumers and speeds builds.
using StorePathGraph = DependencyGraph<StorePath>; using FilePathGraph = DependencyGraph<std::string>; using StorePathGraphWithFiles = DependencyGraph<StorePath, FileListEdgeProperty>; + +// Provided by src/libstore/dependency-graph.cc +extern template class DependencyGraph<StorePath>; +extern template class DependencyGraph<std::string>; +extern template class DependencyGraph<StorePath, FileListEdgeProperty>;src/libstore/include/nix/store/dependency-graph-impl.hh (6)
19-22: Missing required Boost header for Dijkstra.Relying on transitive includes is brittle. Explicitly include the algorithm header.
#include <boost/graph/graph_traits.hpp> #include <boost/graph/reverse_graph.hpp> #include <boost/graph/properties.hpp> +#include <boost/graph/dijkstra_shortest_paths.hpp>
28-37: Constructor builds graph with per-path queries; consider batched queries.If Store performs RPCs, querying each path individually can be N round-trips. Prefer a batched query API (if available) or prefetch/caching to reduce latency.
If Store exposes a multi-get (e.g., queryPathInfos), wiring it here would drop latency for large closures.
102-112: Enrich error with NodeId.Helps debugging when a node is missing.
- if (!opt.has_value()) { - throw Error("node not found in graph"); - } + if (!opt.has_value()) { + throw Error("node '{}' not found in graph", id); + }
169-178: Avoid std::ranges::to (C++23); use C++20-friendly collection.std::ranges::to is C++23. Prefer an explicit vector build unless the project already mandates C++23.
- auto sortedSuccessors = successors | std::views::transform([&](const auto & ref) -> std::pair<size_t, NodeId> { - auto v = getVertexOrThrow(ref); - return {(*cachedDistances)[v], ref}; - }) - | std::views::filter([](const auto & p) { - // Filter unreachable nodes - return p.first != std::numeric_limits<size_t>::max(); - }) - | std::ranges::to<std::vector>(); + std::vector<std::pair<size_t, NodeId>> sortedSuccessors; + sortedSuccessors.reserve(successors.size()); + for (const auto &ref : successors) { + auto v = getVertexOrThrow(ref); + auto d = (*cachedDistances)[v]; + if (d != std::numeric_limits<size_t>::max()) { + sortedSuccessors.emplace_back(d, ref); + } + }
199-207: getSuccessors uses std::ranges::to (C++23).Return a vector without relying on C++23.
- return std::ranges::subrange(adjBegin, adjEnd) | std::views::transform([&](auto v) { return getNodeId(v); }) - | std::ranges::to<std::vector>(); + std::vector<NodeId> out; + out.reserve(std::distance(adjBegin, adjEnd)); + for (auto it = adjBegin; it != adjEnd; ++it) { + out.emplace_back(getNodeId(*it)); + } + return out;
226-229: getAllNodes uses std::ranges::to (C++23).Make it C++20-friendly.
- return nodeToVertex | std::views::keys | std::ranges::to<std::vector>(); + std::vector<NodeId> nodes; + nodes.reserve(nodeToVertex.size()); + for (const auto &kv : nodeToVertex) { + nodes.emplace_back(kv.first); + } + return nodes;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/libstore-tests/dependency-graph.cc(1 hunks)src/libstore-tests/meson.build(1 hunks)src/libstore/dependency-graph.cc(1 hunks)src/libstore/include/nix/store/dependency-graph-impl.hh(1 hunks)src/libstore/include/nix/store/dependency-graph.hh(1 hunks)src/libstore/include/nix/store/meson.build(1 hunks)src/libstore/meson.build(1 hunks)
🔇 Additional comments (1)
src/libstore-tests/meson.build (1)
59-59: LGTM.Test is correctly wired into the suite.
Introduces a reusable directed graph template built on Boost Graph Library (BGL) to provide graph operations for store path dependency analysis. This will be used by `nix why-depends` and future cycle detection.
bddc239    to
    501c928      
    Compare
  
    | 
           @coderabbitai review  | 
    
          
✅ Actions performedReview triggered. 
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/libstore/include/nix/store/dependency-graph-impl.hh (2)
180-181: Sort by distance only; avoid NodeId tie‑breaker coupling.- std::ranges::sort(sortedSuccessors); + std::ranges::sort(sortedSuccessors, [](const auto &a, const auto &b) { + return a.first < b.first; + });
156-198: DFS can loop on cycles; add visited tracking keyed by vertex index.- // DFS with distance-based ordering - std::function<bool(const NodeId &, size_t)> dfs = [&](const NodeId & node, size_t depth) -> bool { + // Validate start early for clearer errors + (void)getVertexOrThrow(start); + + // DFS with distance-based ordering + cycle guard + std::vector<unsigned char> visited(n, 0); // by vertex index + std::function<bool(const NodeId &, size_t)> dfs = [&](const NodeId & node, size_t depth) -> bool { + auto vNode = getVertexOrThrow(node); + if (visited[vNode]) return false; + visited[vNode] = 1; // Visit node - if returns false, skip this subtree if (!visitNode(node, depth)) { return false; } @@ - std::ranges::sort(sortedSuccessors); + // Sort strictly by distance; do not rely on NodeId ordering + std::ranges::sort(sortedSuccessors, [](const auto &a, const auto &b) { + return a.first < b.first; + });
🧹 Nitpick comments (10)
src/libstore/include/nix/store/dependency-graph.hh (4)
7-10: Trim heavy BGL includes from the public header.Move algorithm headers to the impl to reduce rebuild surface and speed up consumers.
#include <boost/graph/adjacency_list.hpp> -#include <boost/graph/dijkstra_shortest_paths.hpp> -#include <boost/graph/reverse_graph.hpp>Add them in dependency-graph-impl.hh (see related comment there).
24-26: Confirm the need forstd::totally_orderedon NodeId.Given pair sorting in dfs tie‑breaks on NodeId, the constraint is coherent. If you switch to sorting strictly by distance (see impl note), you could relax this concept later.
141-143: Avoid copying potentially large edge properties; add a pointer/borrowed accessor.Provide a zero‑copy accessor alongside the existing API.
[[nodiscard]] std::optional<EdgeProperty> getEdgeProperty(const NodeId & from, const NodeId & to) const requires(!std::same_as<EdgeProperty, boost::no_property>); + + /** + * Non-owning view of the edge property, or nullptr if absent. + */ + [[nodiscard]] const EdgeProperty * getEdgePropertyPtr(const NodeId & from, const NodeId & to) const + requires(!std::same_as<EdgeProperty, boost::no_property>);
93-95: Optional: rvalue overload for property to avoid copies on large props.void addEdge(const NodeId & from, const NodeId & to, const EdgeProperty & prop) requires(!std::same_as<EdgeProperty, boost::no_property>); + void addEdge(const NodeId & from, const NodeId & to, EdgeProperty && prop) + requires(!std::same_as<EdgeProperty, boost::no_property>);src/libstore/include/nix/store/dependency-graph-impl.hh (6)
19-22: Add dijkstra header here (since it’s removed from the public header).#include <boost/graph/graph_traits.hpp> #include <boost/graph/reverse_graph.hpp> #include <boost/graph/properties.hpp> +#include <boost/graph/dijkstra_shortest_paths.hpp>
28-37: Potential N+1 store queries; consider batching.
queryPathInfo(path)per element may be costly. If Store API supports a batched refs query, prefer that for large closures.
135-141: Validatestartexists before traversal for earlier failure.// Compute distances locally for this traversal auto targetVertex = getVertexOrThrow(target); size_t n = boost::num_vertices(graph); + // Fail fast if start is absent + (void)getVertexOrThrow(start);
67-89: Add rvalue overload to avoid property copies on insert/merge.template<GraphNodeId NodeId, typename EdgeProperty> void DependencyGraph<NodeId, EdgeProperty>::addEdge(const NodeId & from, const NodeId & to, const EdgeProperty & prop) requires(!std::same_as<EdgeProperty, boost::no_property>) { @@ } + +template<GraphNodeId NodeId, typename EdgeProperty> +void DependencyGraph<NodeId, EdgeProperty>::addEdge(const NodeId & from, const NodeId & to, EdgeProperty && prop) + requires(!std::same_as<EdgeProperty, boost::no_property>) +{ + auto vFrom = addOrGetVertex(from); + auto vTo = addOrGetVertex(to); + auto [existingEdge, found] = boost::edge(vFrom, vTo, graph); + if (found) { + if constexpr (std::same_as<EdgeProperty, FileListEdgeProperty>) { + auto & edgeFiles = graph[existingEdge].files; + edgeFiles.insert(prop.files.begin(), prop.files.end()); + } else { + graph[existingEdge] = std::move(prop); + } + } else { + boost::add_edge(vFrom, vTo, std::move(prop), graph); + } +}
210-224: Implement zero‑copy edge property accessor proposed in the header.template<GraphNodeId NodeId, typename EdgeProperty> std::optional<EdgeProperty> DependencyGraph<NodeId, EdgeProperty>::getEdgeProperty(const NodeId & from, const NodeId & to) const requires(!std::same_as<EdgeProperty, boost::no_property>) { @@ return graph[edge]; } + +template<GraphNodeId NodeId, typename EdgeProperty> +const EdgeProperty * +DependencyGraph<NodeId, EdgeProperty>::getEdgePropertyPtr(const NodeId & from, const NodeId & to) const + requires(!std::same_as<EdgeProperty, boost::no_property>) +{ + auto vFrom = getVertexOrThrow(from); + auto vTo = getVertexOrThrow(to); + auto [edge, found] = boost::edge(vFrom, vTo, graph); + return found ? &graph[edge] : nullptr; +}
135-156: Optional: replace Dijkstra (uniform weights) with BFS on the reversed graph for O(V+E).I can provide a BFS patch if desired.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/libstore-tests/dependency-graph.cc(1 hunks)src/libstore-tests/meson.build(1 hunks)src/libstore/dependency-graph.cc(1 hunks)src/libstore/include/nix/store/dependency-graph-impl.hh(1 hunks)src/libstore/include/nix/store/dependency-graph.hh(1 hunks)src/libstore/include/nix/store/meson.build(1 hunks)src/libstore/meson.build(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/libstore-tests/meson.build
 - src/libstore/dependency-graph.cc
 - src/libstore/meson.build
 - src/libstore/include/nix/store/meson.build
 - src/libstore-tests/dependency-graph.cc
 
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: tests on ubuntu (with sanitizers / coverage)
 - GitHub Check: tests on macos
 - GitHub Check: tests on ubuntu
 
🔇 Additional comments (1)
src/libstore/include/nix/store/dependency-graph.hh (1)
33-37: Doc ↔ impl aligned on idempotency and merge policy.Idempotent addEdge and FileListEdgeProperty dedup/merge semantics match the implementation.
Motivation
Introduces a reusable directed graph template built on Boost Graph Library
(BGL) to provide graph operations for store path dependency analysis. This
will be used by
nix why-dependsand future cycle detection.Context
Another outcome of #14218, like #14360, this is a first step toward a generalized dependency graph analysis abstraction that we can use for both
why-dependsand detailed cycle analysis. In a follow-up PR, we will hook this up toscanForReferencesDeepwhich will populateFileListEdgePropertys. Then, this will become the graph machinery inwhy-depends.Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.
Summary by CodeRabbit
New Features
Tests