Skip to content

Conversation

@lovesegfault
Copy link
Member

@lovesegfault lovesegfault commented Oct 28, 2025

Motivation

Introduces a reusable directed graph template built on Boost Graph Library
(BGL) to provide graph operations for store path dependency analysis. This
will be used by nix why-depends and future cycle detection.

Context

Another outcome of #14218, like #14360, this is a first step toward a generalized dependency graph analysis abstraction that we can use for both why-depends and detailed cycle analysis. In a follow-up PR, we will hook this up to scanForReferencesDeep which will populate FileListEdgePropertys. Then, this will become the graph machinery in why-depends.


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Summary by CodeRabbit

  • New Features

    • Added dependency graph infrastructure for analyzing package relationships and file dependencies.
    • Supports customizable graph nodes and edge properties for flexible dependency tracking.
    • Enables distance computation and traversal utilities for dependency chains.
  • Tests

    • Added comprehensive test coverage for dependency graph operations.

@lovesegfault lovesegfault requested a review from xokdvium October 28, 2025 03:47
@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Oct 28, 2025
@lovesegfault lovesegfault force-pushed the bgl-depgraph-init branch 2 times, most recently from 1d0fd16 to bddc239 Compare October 28, 2025 05:33
@lovesegfault
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

Walkthrough

Adds a generic templated DependencyGraph (header + implementation), explicit template instantiations, build integration, and a Google Test suite exercising graph operations, traversal, and edge/property behaviors.

Changes

Cohort / File(s) Summary
Header + Impl
src/libstore/include/nix/store/dependency-graph.hh, src/libstore/include/nix/store/dependency-graph-impl.hh
Introduces DependencyGraph<NodeId, EdgeProperty> template, FileListEdgeProperty, typedefs (StorePathGraph, FilePathGraph, StorePathGraphWithFiles), and full template method implementations: vertex/edge management, edge-property merging, distance computation via Dijkstra, distance-cached DFS traversal ordered by distance, and node/edge query APIs.
Explicit Instantiations
src/libstore/dependency-graph.cc
Adds implementation file that includes the impl header and provides explicit instantiations for DependencyGraph<StorePath>, DependencyGraph<std::string>, and DependencyGraph<StorePath, FileListEdgeProperty>.
Build Files (libstore)
src/libstore/meson.build
Adds dependency-graph.cc to libstore sources so instantiated templates are compiled into the library.
Build Files (public headers)
src/libstore/include/nix/store/meson.build
Exports new headers dependency-graph.hh and dependency-graph-impl.hh as public include files.
Unit Tests
src/libstore-tests/dependency-graph.cc, src/libstore-tests/meson.build
New Google Test file exercising addEdge/hasNode/getSuccessors/getAllNodes/numVertices/dfsFromTarget behaviors, error cases (missing node), and empty graph; added to test build.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Graph
    participant Dijkstra
    participant DFS

    Caller->>Graph: dfsFromTarget(start, target, visitNode, visitEdge, shouldStop)
    activate Graph

    Graph->>Dijkstra: computeDistancesFrom(target)  %% compute distances on reversed graph
    activate Dijkstra
    Dijkstra-->>Graph: distances cached
    deactivate Dijkstra

    Graph->>DFS: start distance-ordered DFS from start
    activate DFS

    loop for each visited node
        DFS->>Graph: getSuccessors(node)
        DFS->>Caller: visitNode(node)
        loop for each successor (ordered by distance)
            DFS->>Caller: visitEdge(from,to,is_last)
            alt shouldStop == true
                DFS->>DFS: stop traversal
            end
        end
    end

    DFS-->>Graph: traversal complete
    deactivate DFS
    Graph-->>Caller: return
    deactivate Graph
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus review on:
    • dependency-graph-impl.hh — edge merging semantics for FileListEdgeProperty, distance caching correctness, and Dijkstra on reversed graph.
    • dfsFromTarget — ordering by distances, visitor call semantics, and stop-predicate behavior.
    • Public API/headers and exported header lists in Meson build files.
    • Test coverage in src/libstore-tests/dependency-graph.cc to ensure key behaviors are validated.

🐰 I stitched the paths with careful hops,
Edges and distances measured in tiny hops,
DFS dances, Dijkstra hums,
Files collect where merging comes,
A rabbit cheers for graph-filled crops! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "refactor(libstore): add BGL-based dependency graph for path analysis" accurately reflects the primary change in the changeset. The PR introduces a new templated DependencyGraph class implemented using Boost Graph Library, along with supporting implementation files, template instantiations, and tests. The title clearly identifies what is being added (BGL-based dependency graph), the scope (libstore), and the purpose (path analysis). It is specific, uses appropriate technical terminology, and avoids vague language or noise. A developer scanning the repository history would understand this PR adds infrastructure for dependency graph analysis.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (19)
src/libstore/meson.build (1)

280-280: Align instantiation strategy with header exposure.

You’re compiling explicit instantiations in dependency-graph.cc while also installing the -impl.hh publicly. This risks duplicate instantiations across TUs that include the impl. Recommend: add extern template declarations in dependency-graph.hh and have consumers include only the interface header; keep -impl.hh private. See related comments in headers/tests.

src/libstore/include/nix/store/meson.build (1)

34-35: Don’t install the impl header as public.

Exporting dependency-graph-impl.hh invites downstream re-instantiation and locks in implementation details. Prefer:

  • Keep only dependency-graph.hh public.
  • Make dependency-graph-impl.hh private (not installed).
  • Pair with extern template declarations in dependency-graph.hh and explicit instantiations in dependency-graph.cc.
src/libstore/dependency-graph.cc (1)

1-10: Include the interface header first and ensure complete types for instantiation.

Add dependency-graph.hh before impl and include to guarantee std::string is complete during explicit instantiation.

-#include "nix/store/dependency-graph-impl.hh"
+#include "nix/store/dependency-graph.hh"
+#include "nix/store/dependency-graph-impl.hh"
+#include <string>

Also pair these with extern template declarations in dependency-graph.hh to prevent duplicate instantiations in consumers that include the impl. See header comment.

src/libstore-tests/dependency-graph.cc (4)

1-4: Prefer the public interface header and include algorithms.

Include dependency-graph.hh (not -impl.hh) and add for ranges::contains.

-#include "nix/store/dependency-graph-impl.hh"
+#include "nix/store/dependency-graph.hh"
 #include <gtest/gtest.h>
+#include <algorithm>  // std::ranges::contains

If you keep explicit instantiations in the library, avoid including -impl.hh in tests to prevent redundant template instantiation.


27-28: Fix DFS comment to match behavior.

Comment says “B and C before recursing”, but the expectations allow immediate recursion (A→B→D). Update the comment to reflect “successors are considered in distance order; traversal remains DFS”.

-    // Successors should be visited in distance order (B and C before recursing)
+    // Successors are considered in distance order; traversal remains DFS (may recurse into B or C first)

49-53: Strengthen assertions for traversal.

Guard against out-of-bounds and assert that D is eventually visited.

-    EXPECT_EQ(visitedNodes[0], "a");
-    // B and C both at distance 1, could be in either order
-    EXPECT_TRUE(
-        (visitedNodes[1] == "b" && visitedNodes[2] == "d") || (visitedNodes[1] == "c" && visitedNodes[2] == "d"));
+    ASSERT_GE(visitedNodes.size(), 3u);
+    EXPECT_EQ(visitedNodes[0], "a");
+    // First child is either b or c, followed by d due to DFS.
+    EXPECT_TRUE(
+        (visitedNodes[1] == "b" && visitedNodes[2] == "d") ||
+        (visitedNodes[1] == "c" && visitedNodes[2] == "d"));
+    EXPECT_TRUE(std::ranges::contains(visitedNodes, "d"));

34-47: Optionally assert visitedEdges to lock traversal semantics.

You already collect visitedEdges; add minimal checks (contains (a,b) or (a,c) and an edge to d).

     std::vector<std::pair<std::string, std::string>> visitedEdges;
@@
-        [&](const std::string & from, const std::string & to, bool isLast, size_t depth) {
+        [&](const std::string & from, const std::string & to, bool /*isLast*/, size_t /*depth*/) {
             visitedEdges.emplace_back(from, to);
         },
@@
+    EXPECT_FALSE(visitedEdges.empty());
+    EXPECT_TRUE(std::ranges::contains(visitedEdges, std::make_pair(std::string("a"), std::string("b"))) ||
+                std::ranges::contains(visitedEdges, std::make_pair(std::string("a"), std::string("c"))));
+    EXPECT_TRUE(std::ranges::contains(visitedEdges, std::make_pair(std::string("b"), std::string("d"))) ||
+                std::ranges::contains(visitedEdges, std::make_pair(std::string("c"), std::string("d"))));
src/libstore/include/nix/store/dependency-graph.hh (6)

82-89: Document/guarantee edge merge semantics.

addEdge(from,to,prop) says “Merges property if edge exists.” Please specify merge semantics (e.g., FileListEdgeProperty: stable, deduped union with deterministic order) and ensure strong exception safety.


119-126: Consider [[nodiscard]] on traversal to prevent ignored calls.

dfsFromTarget returns void, fine; but for getters below, [[nodiscard]] avoids accidental ignore. See next comment.


130-143: Mark query methods [[nodiscard]] and clarify ordering.

getSuccessors/getAllNodes return vectors; consider [[nodiscard]] and document ordering (insertion, NodeId order, or distance). Determinism helps tests and users.

-    std::vector<NodeId> getSuccessors(const NodeId & node) const;
+    [[nodiscard]] std::vector<NodeId> getSuccessors(const NodeId & node) const;
@@
-    std::vector<NodeId> getAllNodes() const;
+    [[nodiscard]] std::vector<NodeId> getAllNodes() const;
@@
-    size_t numVertices() const
+    [[nodiscard]] size_t numVertices() const

135-137: Avoid copying heavy edge properties.

Returning std::optional by value can be expensive (e.g., FileListEdgeProperty). Consider returning a pointer or reference-like view:

-    std::optional<EdgeProperty> getEdgeProperty(const NodeId & from, const NodeId & to) const
+    const EdgeProperty * getEdgePropertyPtr(const NodeId & from, const NodeId & to) const
         requires(!std::same_as<EdgeProperty, boost::no_property>);

Keeps ABI simple and avoids copies.


149-152: Define deterministic ordering for files on an edge.

For FileListEdgeProperty::files, specify invariant (sorted, unique) to make merges deterministic and comparisons cheap.


155-158: Add extern template declarations to match explicit instantiations.

Prevents duplicate instantiations in consumers and speeds builds.

 using StorePathGraph = DependencyGraph<StorePath>;
 using FilePathGraph = DependencyGraph<std::string>;
 using StorePathGraphWithFiles = DependencyGraph<StorePath, FileListEdgeProperty>;
+
+// Provided by src/libstore/dependency-graph.cc
+extern template class DependencyGraph<StorePath>;
+extern template class DependencyGraph<std::string>;
+extern template class DependencyGraph<StorePath, FileListEdgeProperty>;
src/libstore/include/nix/store/dependency-graph-impl.hh (6)

19-22: Missing required Boost header for Dijkstra.

Relying on transitive includes is brittle. Explicitly include the algorithm header.

 #include <boost/graph/graph_traits.hpp>
 #include <boost/graph/reverse_graph.hpp>
 #include <boost/graph/properties.hpp>
+#include <boost/graph/dijkstra_shortest_paths.hpp>

28-37: Constructor builds graph with per-path queries; consider batched queries.

If Store performs RPCs, querying each path individually can be N round-trips. Prefer a batched query API (if available) or prefetch/caching to reduce latency.

If Store exposes a multi-get (e.g., queryPathInfos), wiring it here would drop latency for large closures.


102-112: Enrich error with NodeId.

Helps debugging when a node is missing.

-    if (!opt.has_value()) {
-        throw Error("node not found in graph");
-    }
+    if (!opt.has_value()) {
+        throw Error("node '{}' not found in graph", id);
+    }

169-178: Avoid std::ranges::to (C++23); use C++20-friendly collection.

std::ranges::to is C++23. Prefer an explicit vector build unless the project already mandates C++23.

-        auto sortedSuccessors = successors | std::views::transform([&](const auto & ref) -> std::pair<size_t, NodeId> {
-                                    auto v = getVertexOrThrow(ref);
-                                    return {(*cachedDistances)[v], ref};
-                                })
-                                | std::views::filter([](const auto & p) {
-                                      // Filter unreachable nodes
-                                      return p.first != std::numeric_limits<size_t>::max();
-                                  })
-                                | std::ranges::to<std::vector>();
+        std::vector<std::pair<size_t, NodeId>> sortedSuccessors;
+        sortedSuccessors.reserve(successors.size());
+        for (const auto &ref : successors) {
+            auto v = getVertexOrThrow(ref);
+            auto d = (*cachedDistances)[v];
+            if (d != std::numeric_limits<size_t>::max()) {
+                sortedSuccessors.emplace_back(d, ref);
+            }
+        }

199-207: getSuccessors uses std::ranges::to (C++23).

Return a vector without relying on C++23.

-    return std::ranges::subrange(adjBegin, adjEnd) | std::views::transform([&](auto v) { return getNodeId(v); })
-           | std::ranges::to<std::vector>();
+    std::vector<NodeId> out;
+    out.reserve(std::distance(adjBegin, adjEnd));
+    for (auto it = adjBegin; it != adjEnd; ++it) {
+        out.emplace_back(getNodeId(*it));
+    }
+    return out;

226-229: getAllNodes uses std::ranges::to (C++23).

Make it C++20-friendly.

-    return nodeToVertex | std::views::keys | std::ranges::to<std::vector>();
+    std::vector<NodeId> nodes;
+    nodes.reserve(nodeToVertex.size());
+    for (const auto &kv : nodeToVertex) {
+        nodes.emplace_back(kv.first);
+    }
+    return nodes;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd0d006 and bddc239.

📒 Files selected for processing (7)
  • src/libstore-tests/dependency-graph.cc (1 hunks)
  • src/libstore-tests/meson.build (1 hunks)
  • src/libstore/dependency-graph.cc (1 hunks)
  • src/libstore/include/nix/store/dependency-graph-impl.hh (1 hunks)
  • src/libstore/include/nix/store/dependency-graph.hh (1 hunks)
  • src/libstore/include/nix/store/meson.build (1 hunks)
  • src/libstore/meson.build (1 hunks)
🔇 Additional comments (1)
src/libstore-tests/meson.build (1)

59-59: LGTM.

Test is correctly wired into the suite.

Introduces a reusable directed graph template built on Boost Graph Library
(BGL) to provide graph operations for store path dependency analysis. This
will be used by `nix why-depends` and future cycle detection.
@lovesegfault
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/libstore/include/nix/store/dependency-graph-impl.hh (2)

180-181: Sort by distance only; avoid NodeId tie‑breaker coupling.

-        std::ranges::sort(sortedSuccessors);
+        std::ranges::sort(sortedSuccessors, [](const auto &a, const auto &b) {
+            return a.first < b.first;
+        });

156-198: DFS can loop on cycles; add visited tracking keyed by vertex index.

-    // DFS with distance-based ordering
-    std::function<bool(const NodeId &, size_t)> dfs = [&](const NodeId & node, size_t depth) -> bool {
+    // Validate start early for clearer errors
+    (void)getVertexOrThrow(start);
+
+    // DFS with distance-based ordering + cycle guard
+    std::vector<unsigned char> visited(n, 0); // by vertex index
+    std::function<bool(const NodeId &, size_t)> dfs = [&](const NodeId & node, size_t depth) -> bool {
+        auto vNode = getVertexOrThrow(node);
+        if (visited[vNode]) return false;
+        visited[vNode] = 1;
         // Visit node - if returns false, skip this subtree
         if (!visitNode(node, depth)) {
             return false;
         }
@@
-        std::ranges::sort(sortedSuccessors);
+        // Sort strictly by distance; do not rely on NodeId ordering
+        std::ranges::sort(sortedSuccessors, [](const auto &a, const auto &b) {
+            return a.first < b.first;
+        });
🧹 Nitpick comments (10)
src/libstore/include/nix/store/dependency-graph.hh (4)

7-10: Trim heavy BGL includes from the public header.

Move algorithm headers to the impl to reduce rebuild surface and speed up consumers.

 #include <boost/graph/adjacency_list.hpp>
-#include <boost/graph/dijkstra_shortest_paths.hpp>
-#include <boost/graph/reverse_graph.hpp>

Add them in dependency-graph-impl.hh (see related comment there).


24-26: Confirm the need for std::totally_ordered on NodeId.

Given pair sorting in dfs tie‑breaks on NodeId, the constraint is coherent. If you switch to sorting strictly by distance (see impl note), you could relax this concept later.


141-143: Avoid copying potentially large edge properties; add a pointer/borrowed accessor.

Provide a zero‑copy accessor alongside the existing API.

   [[nodiscard]] std::optional<EdgeProperty> getEdgeProperty(const NodeId & from, const NodeId & to) const
       requires(!std::same_as<EdgeProperty, boost::no_property>);
+
+  /**
+   * Non-owning view of the edge property, or nullptr if absent.
+   */
+  [[nodiscard]] const EdgeProperty * getEdgePropertyPtr(const NodeId & from, const NodeId & to) const
+      requires(!std::same_as<EdgeProperty, boost::no_property>);

93-95: Optional: rvalue overload for property to avoid copies on large props.

   void addEdge(const NodeId & from, const NodeId & to, const EdgeProperty & prop)
       requires(!std::same_as<EdgeProperty, boost::no_property>);
+  void addEdge(const NodeId & from, const NodeId & to, EdgeProperty && prop)
+      requires(!std::same_as<EdgeProperty, boost::no_property>);
src/libstore/include/nix/store/dependency-graph-impl.hh (6)

19-22: Add dijkstra header here (since it’s removed from the public header).

 #include <boost/graph/graph_traits.hpp>
 #include <boost/graph/reverse_graph.hpp>
 #include <boost/graph/properties.hpp>
+#include <boost/graph/dijkstra_shortest_paths.hpp>

28-37: Potential N+1 store queries; consider batching.

queryPathInfo(path) per element may be costly. If Store API supports a batched refs query, prefer that for large closures.


135-141: Validate start exists before traversal for earlier failure.

     // Compute distances locally for this traversal
     auto targetVertex = getVertexOrThrow(target);
     size_t n = boost::num_vertices(graph);
+    // Fail fast if start is absent
+    (void)getVertexOrThrow(start);

67-89: Add rvalue overload to avoid property copies on insert/merge.

 template<GraphNodeId NodeId, typename EdgeProperty>
 void DependencyGraph<NodeId, EdgeProperty>::addEdge(const NodeId & from, const NodeId & to, const EdgeProperty & prop)
     requires(!std::same_as<EdgeProperty, boost::no_property>)
 {
@@
 }
+
+template<GraphNodeId NodeId, typename EdgeProperty>
+void DependencyGraph<NodeId, EdgeProperty>::addEdge(const NodeId & from, const NodeId & to, EdgeProperty && prop)
+    requires(!std::same_as<EdgeProperty, boost::no_property>)
+{
+    auto vFrom = addOrGetVertex(from);
+    auto vTo = addOrGetVertex(to);
+    auto [existingEdge, found] = boost::edge(vFrom, vTo, graph);
+    if (found) {
+        if constexpr (std::same_as<EdgeProperty, FileListEdgeProperty>) {
+            auto & edgeFiles = graph[existingEdge].files;
+            edgeFiles.insert(prop.files.begin(), prop.files.end());
+        } else {
+            graph[existingEdge] = std::move(prop);
+        }
+    } else {
+        boost::add_edge(vFrom, vTo, std::move(prop), graph);
+    }
+}

210-224: Implement zero‑copy edge property accessor proposed in the header.

 template<GraphNodeId NodeId, typename EdgeProperty>
 std::optional<EdgeProperty>
 DependencyGraph<NodeId, EdgeProperty>::getEdgeProperty(const NodeId & from, const NodeId & to) const
     requires(!std::same_as<EdgeProperty, boost::no_property>)
 {
@@
     return graph[edge];
 }
+
+template<GraphNodeId NodeId, typename EdgeProperty>
+const EdgeProperty *
+DependencyGraph<NodeId, EdgeProperty>::getEdgePropertyPtr(const NodeId & from, const NodeId & to) const
+    requires(!std::same_as<EdgeProperty, boost::no_property>)
+{
+    auto vFrom = getVertexOrThrow(from);
+    auto vTo = getVertexOrThrow(to);
+    auto [edge, found] = boost::edge(vFrom, vTo, graph);
+    return found ? &graph[edge] : nullptr;
+}

135-156: Optional: replace Dijkstra (uniform weights) with BFS on the reversed graph for O(V+E).

I can provide a BFS patch if desired.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bddc239 and 501c928.

📒 Files selected for processing (7)
  • src/libstore-tests/dependency-graph.cc (1 hunks)
  • src/libstore-tests/meson.build (1 hunks)
  • src/libstore/dependency-graph.cc (1 hunks)
  • src/libstore/include/nix/store/dependency-graph-impl.hh (1 hunks)
  • src/libstore/include/nix/store/dependency-graph.hh (1 hunks)
  • src/libstore/include/nix/store/meson.build (1 hunks)
  • src/libstore/meson.build (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/libstore-tests/meson.build
  • src/libstore/dependency-graph.cc
  • src/libstore/meson.build
  • src/libstore/include/nix/store/meson.build
  • src/libstore-tests/dependency-graph.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests on ubuntu (with sanitizers / coverage)
  • GitHub Check: tests on macos
  • GitHub Check: tests on ubuntu
🔇 Additional comments (1)
src/libstore/include/nix/store/dependency-graph.hh (1)

33-37: Doc ↔ impl aligned on idempotency and merge policy.

Idempotent addEdge and FileListEdgeProperty dedup/merge semantics match the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

store Issues and pull requests concerning the Nix store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant