Skip to content

Conversation

@daniel-noland
Copy link
Collaborator

@daniel-noland daniel-noland commented Nov 3, 2025

on top of #963

this is up for early review to facilitate cooperation and minimize merge conflicts.

It is still a little too messy to merge but it does deserve discussion.

At the same time, rebase onto this is

  1. likely necessary
  2. likely very messy

but I don't really know what to do about that at this point.

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch 2 times, most recently from 9193aa4 to 5b4d4cc Compare November 3, 2025 02:18
@daniel-noland daniel-noland changed the base branch from main to pr/daniel-noland/release-prep-part-1 November 3, 2025 02:19
@daniel-noland daniel-noland marked this pull request as ready for review November 3, 2025 02:19
@daniel-noland daniel-noland requested a review from a team as a code owner November 3, 2025 02:19
@daniel-noland daniel-noland requested review from sergeymatov and removed request for a team November 3, 2025 02:19
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch from 5b4d4cc to a25b971 Compare November 3, 2025 02:20
@daniel-noland daniel-noland changed the title Release prep part 2 Release prep part 2: use dpdk (will break the pipeline if merged before unified with part 3) Nov 3, 2025
@daniel-noland daniel-noland self-assigned this Nov 3, 2025
@daniel-noland daniel-noland added the enhancement New feature or request label Nov 3, 2025
@daniel-noland daniel-noland added this to the GW R1 milestone Nov 3, 2025
@daniel-noland daniel-noland mentioned this pull request Nov 3, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the dataplane initialization process by introducing a two-stage initialization system with a dedicated dataplane-init binary and secure configuration passing via sealed memory file descriptors. The key changes include moving default constants to a centralized location, implementing type-safe configuration structures with serialization, and establishing a more robust launch architecture.

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
routing/src/router.rs Updates import paths to use default constants from args module
routing/src/rio.rs Removes default constant definitions now centralized in args module
routing/Cargo.toml Adds args workspace dependency
mgmt/src/processor/launch.rs Updates imports and uses GrpcAddress from args module, removes duplicate type definition
mgmt/Cargo.toml Adds args workspace dependency
init/src/main.rs Complete rewrite implementing new two-stage initialization with hardware scanning and config passing
init/Cargo.toml Adds dependencies for new initialization features
dataplane/src/main.rs Updates to receive configuration via file descriptors instead of command-line parsing
dataplane/src/drivers/dpdk.rs Removes unused init_eal function, updates DriverDpdk::start to return resources
dataplane/Cargo.toml Adds rkyv dependency
args/src/lib.rs Major expansion with secure config types, memfd handling, and LaunchConfiguration structure
args/Cargo.toml Adds numerous dependencies for new functionality
Dockerfile Updates entrypoint to use dataplane-init
Cargo.lock Dependency updates including new crates
Comments suppressed due to low confidence (1)

dataplane/src/drivers/dpdk.rs:37

  • yaml\nconfidence: 9\ntags: [logic]\n\n\nThe init_eal function appears to be dead code that was replaced by direct calls to eal::init. Consider removing it to avoid confusion and reduce code maintenance burden.
fn init_eal(args: impl IntoIterator<Item = impl AsRef<str>>) -> Eal {
    let rte = eal::init(args);
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::DEBUG)
        .with_target(true)
        .with_thread_ids(true)
        .with_line_number(true)
        .with_thread_names(true)
        .init();
    rte
}

init/src/main.rs Outdated

#![doc = include_str!("../README.md")]
#![deny(clippy::pedantic, missing_docs)]
// #![deny(clippy::pedantic, missing_docs)] // TEMP: don't merge till uncommented
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confidence: 10
tags: [other]

This comment explicitly states the code should not be merged until the lint/doc checks are uncommented, which contradicts the PR being ready for review. This temporary bypass should be addressed before merging.

Suggested change
// #![deny(clippy::pedantic, missing_docs)] // TEMP: don't merge till uncommented
#![deny(clippy::pedantic, missing_docs)]

Copilot uses AI. Check for mistakes.
.into_diagnostic()
.wrap_err("failed to serialize launch configuration as yaml")
.unwrap();
info!("interpreted requested lanunch configuration as\n---\n{launch_config_yaml}");
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: 'lanunch' should be 'launch'.

Suggested change
info!("interpreted requested lanunch configuration as\n---\n{launch_config_yaml}");
info!("interpreted requested launch configuration as\n---\n{launch_config_yaml}");

Copilot uses AI. Check for mistakes.
init/src/main.rs Outdated
let launch_config = launch_config.to_owned_fd();

let io_err = std::process::Command::new(
"/home/dnoland/code/githedgehog/dataplane/target/x86_64-unknown-linux-gnu/debug/dataplane",
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml\nconfidence: 10\ntags: [logic]\n\n\nHardcoded absolute path to a user's home directory. This will fail in any other environment. The path should be made configurable or use a relative path/environment variable.

Copilot uses AI. Check for mistakes.
Comment on lines 14 to 16
use tokio::io;
use tokio::net::UnixListener;
use tokio::sync::mpsc::Sender;
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] yaml\nconfidence: 8\ntags: [style]\n\n\nThe import of tokio::io on line 14 is split from other tokio imports on lines 15-16. Consider grouping all tokio imports together for better readability.

Suggested change
use tokio::io;
use tokio::net::UnixListener;
use tokio::sync::mpsc::Sender;
use tokio::net::UnixListener;
use tokio::sync::mpsc::Sender;
use tokio::io;

Copilot uses AI. Check for mistakes.
GrpcAddress::Tcp(addr) => ServerAddress::Tcp(addr),
GrpcAddress::UnixSocket(path) => ServerAddress::Unix(path.to_path_buf()),
args::GrpcAddress::Tcp(addr) => ServerAddress::Tcp(addr),
args::GrpcAddress::UnixSocket(path) => ServerAddress::Unix(path.into()),
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] yaml\nconfidence: 7\ntags: [style]\n\n\nThe change from path.to_path_buf() to path.into() is less explicit about the type conversion happening. While functionally equivalent, the original was clearer about converting to PathBuf.

Suggested change
args::GrpcAddress::UnixSocket(path) => ServerAddress::Unix(path.into()),
args::GrpcAddress::UnixSocket(path) => ServerAddress::Unix(path.to_path_buf()),

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +68
// if args.tracing_config_generate() {
// let out = get_trace_ctl()
// .as_config_string()
// .unwrap_or_else(|e| e.to_string());
// println!("{out}");
// std::process::exit(0);
// }
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml\nconfidence: 8\ntags: [other]\n\n\nCommented-out code should be removed rather than left in the codebase. If this functionality is needed in the future, it can be recovered from version control.

Suggested change
// if args.tracing_config_generate() {
// let out = get_trace_ctl()
// .as_config_string()
// .unwrap_or_else(|e| e.to_string());
// println!("{out}");
// std::process::exit(0);
// }

Copilot uses AI. Check for mistakes.
#[rkyv(attr(derive(PartialEq, Eq, Debug)))]
pub enum GrpcAddress {
Tcp(SocketAddr),
UnixSocket(String),
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] yaml\nconfidence: 7\ntags: [other]\n\n\nThe GrpcAddress::UnixSocket variant changed from PathBuf to String. This is less type-safe as PathBuf better represents filesystem paths and their associated operations.

Suggested change
UnixSocket(String),
UnixSocket(PathBuf),

Copilot uses AI. Check for mistakes.
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-1 branch 6 times, most recently from 46cbe0e to d797cd7 Compare November 3, 2025 06:25
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch from a25b971 to 1dee052 Compare November 3, 2025 06:26

[dependencies]
# internal
args = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the mgmt crate need anything from args? Sounds like a weird "backwards" dependency.


[dependencies]
# internal
args = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

cpi_sock_path: Some(DEFAULT_DP_UX_PATH.to_string()),
cli_sock_path: Some(DEFAULT_DP_UX_PATH_CLI.to_string()),
frrmi_sock_path: Some(DEFAULT_FRR_AGENT_PATH.to_string()),
cpi_sock_path: Some(args::DEFAULT_DP_UX_PATH.to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This reveals the dependency. I see the goal, but personally dislike this in that you're making an internal function depend on an external thing. IMO:

  • every internal function (crate) should define its defaults.
  • those should be made available to users of the crate.
    I see the merit of defining the defaults closer to the user/caller. But then we should ditch the defaults from the inner functions (routing in this case) so that we don't have that "backwards" dependency and require callers to always provide specific values (which they can default).
    That said, leave it like that.

Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Since we can't merge this now as-is without breaking the pipeline, can we "freeze" this branch as an integration one on which we can stack further developments that don't break the pipeline?

@Fredi-raspall Fredi-raspall added the dont-merge Do not merge this Pull Request label Nov 3, 2025
@Fredi-raspall Fredi-raspall added the Integration Branch that should NOT be merged but kept stable as other branches depend on it. label Nov 3, 2025
@Fredi-raspall
Copy link
Contributor

Fredi-raspall commented Nov 3, 2025

@daniel-noland There seems to be an issue with the memfd in the CI job.

Logs for Pod: gw--gateway-1--dataplane-776b6 in Namespace: fab

thread 'main' (1) panicked at args/src/lib.rs:601:18:
called `Result::unwrap()` on an `Err` value:   × failed to read memfd link in /proc
  ╰─▶ ENOENT: No such file or directory

stack backtrace:
   0: __rustc::rust_begin_unwind
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/result.rs:1855:5
   3: core::result::Result<T,E>::unwrap
             at ./nix/store/3wbbdzj5900pa0nflzp8ai9izdd6vk5y-rust-mixed/lib/rustlib/src/rust/library/core/src/result.rs:1226:23
   4: dataplane_args::FinalizedMemFile::from_fd
             at ./home/runner/_work/dataplane/dataplane/args/src/lib.rs:601:18
   5: dataplane_args::LaunchConfiguration::inherit
             at ./home/runner/_work/dataplane/dataplane/args/src/lib.rs:460:45
   6: dataplane::main
             at ./home/runner/_work/dataplane/dataplane/dataplane/src/main.rs:90:25
   7: core::ops::function::FnOnce::call_once
             at ./nix/store/3wbbdzj5900pa0nflzp8ai9izdd6vk5y-rust-mixed/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: IO Safety violation: owned file descriptor already closed, aborting

Previous Logs for Pod: gw--gateway-1--dataplane-776b6 in Namespace: fab (if available)

thread 'main' (1) panicked at args/src/lib.rs:601:18:
called `Result::unwrap()` on an `Err` value:   × failed to read memfd link in /proc
  ╰─▶ ENOENT: No such file or directory

stack backtrace:
   0: __rustc::rust_begin_unwind
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at ./rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/result.rs:1855:5
   3: core::result::Result<T,E>::unwrap
             at ./nix/store/3wbbdzj5900pa0nflzp8ai9izdd6vk5y-rust-mixed/lib/rustlib/src/rust/library/core/src/result.rs:1226:23
   4: dataplane_args::FinalizedMemFile::from_fd
             at ./home/runner/_work/dataplane/dataplane/args/src/lib.rs:601:18
   5: dataplane_args::LaunchConfiguration::inherit
             at ./home/runner/_work/dataplane/dataplane/args/src/lib.rs:460:45
   6: dataplane::main
             at ./home/runner/_work/dataplane/dataplane/dataplane/src/main.rs:90:25
   7: core::ops::function::FnOnce::call_once
             at ./nix/store/3wbbdzj5900pa0nflzp8ai9izdd6vk5y-rust-mixed/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: IO Safety violation: owned file descriptor already closed, aborting

@Frostman could it be that this is caused by not invoking dataplane-init but dataplane ?
Dataplane should be started with binary dataplane-init from now on, right @daniel-noland ?

@Fredi-raspall Fredi-raspall removed the Integration Branch that should NOT be merged but kept stable as other branches depend on it. label Nov 3, 2025
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args crate is now has a wider scope: it is not just responsible
for parsing and validating arguments, it is (partly) responsible for
handing the dataplane a memfd file with parsed and acted on data.

I'm not sure I understand: Why reuse the args crate for that? Wouldn't it make more sense to add a new crate for handling the memfd file, specifically? I liked the idea of having something a bit more self-contained for parsing the arguments.

If it is necessary to have the new parts in args, should we consider renaming that crate?

args/src/lib.rs Outdated
/// A type wrapper around [`MemFile`] for memfd files which are emphatically NOT intended for any kind of data mutation
/// ever again.
///
/// Multiple protections are in place to deny all attempts to mutat the memory contents of these files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Multiple protections are in place to deny all attempts to mutat the memory contents of these files.
/// Multiple protections are in place to deny all attempts to mutate the memory contents of these files.

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-1 branch 3 times, most recently from 51a8366 to b0f0f8c Compare November 3, 2025 20:42
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch from 1dee052 to 2cd696e Compare November 3, 2025 21:36
Base automatically changed from pr/daniel-noland/release-prep-part-1 to main November 3, 2025 22:03
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch 3 times, most recently from 7f67473 to 08722af Compare November 5, 2025 00:55
This commit takes care of the actual passing of a memfd from
dataplane init down into dataplane after a hardware scan and
argument parsing.

It basically takes care of the TODO from last time.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch from 08722af to c619ba2 Compare November 5, 2025 19:34
With this commit the dataplane now consumes and acts on the memfd
created by dataplane init from the previous commit.

Signed-off-by: Daniel Noland <daniel@githedgehog.com>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/release-prep-part-2 branch from c619ba2 to 6afb8b5 Compare November 5, 2025 19:38
@mvachhar mvachhar modified the milestones: GW R1, GW R2 Nov 14, 2025
@mvachhar mvachhar marked this pull request as draft November 14, 2025 15:43
@mvachhar
Copy link
Contributor

Moved this back to draft since we may need to rework this a bit. Now that the kernel driver has good performance we should only merge this with a working kernel driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dont-merge Do not merge this Pull Request enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants