Skip to content

add Ubuntu/Fedora/Arch + custom images, streaming hash, and real E2E tests#18

Draft
userhaptop wants to merge 21 commits intobuck2hub:mainfrom
userhaptop:main
Draft

add Ubuntu/Fedora/Arch + custom images, streaming hash, and real E2E tests#18
userhaptop wants to merge 21 commits intobuck2hub:mainfrom
userhaptop:main

Conversation

@userhaptop
Copy link
Contributor

@userhaptop userhaptop commented Feb 22, 2026

No description provided.

Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
… streaming hash computation

Signed-off-by: userhaptop <1307305157@qq.com>
…d real E2E tests

Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
@genedna genedna requested a review from Copilot February 22, 2026 16:13
@genedna
Copy link
Contributor

genedna commented Feb 22, 2026

@codex review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Qlean’s image and E2E validation capabilities by adding Fedora/Arch support with “latest stable” resolution, improving download robustness (streaming hash + mirror fallback), and replacing placeholder tests with gated real end-to-end VM startup flow tests.

Changes:

  • Add gated (env-controlled) E2E VM startup-flow integration tests for Ubuntu/Fedora/Arch with actionable skip reasons.
  • Add vsock → TCP SSH fallback and QEMU user-networking fallback (hostfwd) to improve reliability on constrained hosts (e.g., WSL2).
  • Add “latest stable” Fedora/Arch image resolution plus multi-mirror download with streaming SHA verification and stall detection.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/ubuntu_image.rs Ubuntu E2E test now runs full boot + SSH whoami flow, gated by env + host capability checks.
tests/fedora_image.rs New Fedora E2E startup-flow test gated by env + libguestfs availability.
tests/arch_image.rs New Arch E2E startup-flow test gated by env + libguestfs availability.
tests/common.rs Centralized E2E gating helpers (KVM/libvirt/vsock/WSL checks) and safer tracing subscriber init.
tests/guestfish.rs Adds runtime/heuristic detection and skip messaging for libguestfs tooling (guestfish/virt-copy-out).
src/ssh.rs Adds TCP SSH fallback and refines vsock error handling/logging for fallbacks.
src/qemu.rs Adds bridged-network preference with fallback to slirp + hostfwd; supports TCP SSH fallback port.
src/machine.rs Reserves an ephemeral local port for SSH hostfwd and passes it through to QEMU + SSH connect.
src/image.rs Adds Fedora/Arch “latest stable” resolvers, streaming hash download w/ idle timeout, and multi-mirror retry with verification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: userhaptop <1307305157@qq.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e13917bcf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

userhaptop and others added 4 commits February 23, 2026 00:43
…E2E no longer

  requires libguestfs tools.
- Retry Fedora CHECKSUM fetch across mirrors (match qcow2 mirror resilience).
- Fix ShaType ownership in cache/download paths.

Signed-off-by: userhaptop <1307305157@qq.com>
@genedna
Copy link
Contributor

genedna commented Feb 23, 2026

@claude

@github-actions
Copy link

github-actions bot commented Feb 23, 2026

Claude finished @genedna's task in 4m 12s —— View job


PR Review

This PR substantially improves Qlean's image module. The overall direction is sound — the review feedback from prior comments has been addressed, and the new abstractions (streaming hash, multi-mirror download, TCP SSH fallback) are well-structured. Below are findings ranging from bugs to style nits.


Critical / Correctness

1. TCP port reservation is racy (TOCTOU) — src/machine.rs:140-147

let ssh_tcp_port = std::net::TcpListener::bind(("127.0.0.1", 0))
    .map(|l| l.local_addr().map(|a| a.port()))
    ...??;

The listener is dropped immediately after reading the port, creating a window where another process can claim it before QEMU binds the hostfwd. The comment in the code acknowledges this but accepts the race silently. A safer pattern is to keep the TcpListener alive and pass it through to QemuLaunchParams so it holds the port until QEMU starts and inherits/replaces it. At minimum, document the race prominently.

Fix this →


2. Duplicate save logic with unguarded unwrap()src/image.rs:435-437

create_with_action duplicates the checksum/save block from ImageMeta::save(), but replaces .with_context() with bare .unwrap():

let image_filename = image.path.file_name().unwrap().to_string_lossy();
let kernel_filename = image.kernel.file_name().unwrap().to_string_lossy();
let initrd_filename = image.initrd.file_name().unwrap().to_string_lossy();

save() correctly uses .with_context() for these. Either call image.save(name).await? (which already exists) or promote these to proper error propagation. The divergence is a maintenance hazard and the panics are avoidable.

Fix this →


3. Ubuntu download has no hash verification — src/image.rs:596-625

The Ubuntu::download() method still uses download_file() (no integrity check), while Fedora and Arch have been upgraded to download_with_hash_multi() with SHA verification. Ubuntu's kernel and initrd are downloaded with no hash at all. This should use Ubuntu's cloud image SHA256SUMS file for at-rest verification.

Fix this →


Design / Maintainability

4. download_file() still present and used — src/image.rs:1037-1056

The old download_file() helper (no hash, no idle timeout, no retry) still exists and is used by Ubuntu. Now that download_with_hash has idle-timeout and progress logging, download_file is a degraded code path. Either upgrade Ubuntu to use the proper path (see #3), or remove download_file entirely to prevent future misuse.


5. resolve_latest_fedora_release depends on a single external API — src/image.rs

let url = "https://endoflife.date/api/v1/products/fedora/";

This introduces an undocumented external dependency on endoflife.date. If the API changes structure or is unavailable, Fedora image creation silently fails. Consider: (a) adding a fallback version constant, (b) caching the result per process, or (c) documenting this dependency in the README/code.


6. Dual NIC when bridging is available — src/qemu.rs:76-92

When want_bridge is true, the code adds both a bridge NIC (net0) and optionally a user-mode NIC (net1). The second NIC gets no explicit MAC address (QEMU auto-generates one). The get_remote_ip() function uses hostname -I | awk '{print $1}' which returns the first IP — with two NICs, this could return the wrong address. If a user-mode NIC is purely a "SSH escape hatch", consider not advertising it to the guest at all (use restrict=on in the netdev string) or ensure get_ip() returns the bridge-assigned address.


7. should_run_vm_tests() blocks on vsock check even when TCP fallback is available — tests/support/e2e.rs:59-67

if !has_vhost_vsock() {
    eprintln!("SKIP: vhost-vsock not available...");
    return false;
}

The PR's stated goal is to support environments without vsock (via TCP fallback). The test gating still unconditionally skips when vsock is absent, even though the new TCP path should allow those environments to run E2E tests. This effectively makes the TCP fallback untestable in automated gating.


8. vsock_timeout hard-coded to 15 seconds — src/ssh.rs

let vsock_timeout = if tcp_port.is_some() {
    std::cmp::min(timeout, Duration::from_secs(15))

On a cold-boot first-init (KVM with cloud-init), 15 seconds can be too short for the vsock device to be ready. This causes a spurious TCP fallback every time on slower hosts and loses vsock's lower latency. Consider making this configurable or at least 30s.


Minor / Style

9. ssh_tcp_port field doc comment vs. type mismatch — src/machine.rs:37

The doc comment says "used as a fallback when vsock is unavailable" but the field is typed u16 (always allocated), not Option<u16>. The doc comment from Copilot's suggestion is more accurate. The type itself is fine; update the comment to clarify that the port is always allocated but only used as a fallback.

10. is_wsl() is duplicated — src/qemu.rs and tests/support/guestfish.rs and tests/support/e2e.rs

Three nearly-identical implementations exist. Moving is_wsl() to src/utils.rs (or a shared utils module) would reduce divergence. The qemu.rs version checks /proc/sys/kernel/osrelease and /proc/version; the test versions only check /proc/version or env vars.

11. find_hash_for_file returns None silently — src/image.rs

When find_hash_for_file() returns None, the caller (e.g., Arch) uses .with_context(). But for Fedora, the CHECKSUM parsing is done in fetch_fedora_checksum_sha256, where a None result produces a generic error: "checksum file {} did not contain hash for {}". This is fine, but the error message omits the parsed checksum file content, making it hard to debug malformed files in production. Consider logging the checksum content at debug level on failure.

12. test_distro_enum_variants removed without replacement

The PR removes test_distro_enum_variants (the assert_eq!(variants.len(), 5) test), citing "meaningless tests" in the PR description. While that specific test was weak, there's now no coverage to catch someone adding a new Distro variant without updating create_image(). A match expression in create_image() without a wildcard arm provides compile-time coverage, which is sufficient — just note this is now compiler-enforced rather than test-enforced.


Summary

Area Status
Prior review issues (Copilot/Codex) ✅ Addressed
Ubuntu hash verification ❌ Missing
Port TOCTOU ⚠️ Acknowledged but fragile
create_with_action unwraps ❌ Should use with_context
TCP fallback testability ⚠️ Test gating contradicts feature
download_file dead code path ⚠️ Should be removed/upgraded
Multi-mirror download ✅ Well-implemented
Streaming hash ✅ Correct and well-tested
E2E test gating via env var ✅ Clean approach

The core implementations are solid. The most important things to fix before merge are: Ubuntu hash verification (#3), the create_with_action unwraps (#2), and removing or upgrading the leftover download_file usage (#4).

…irror download integrity, E2E testing gating, and maintainability.

Signed-off-by: userhaptop <1307305157@qq.com>
…SL/restricted environments: enhanced download verification and skip logic in Fedora/Arch, and fixed the potential unwrap() risk in save() mentioned in the review.

Signed-off-by: userhaptop <1307305157@qq.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@jjl9807 jjl9807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are two points to address first:

  1. Please paste the complete output logs from running these tests locally:

    QLEAN_RUN_E2E=1 cargo test --test ubuntu_image -- --nocapture
    QLEAN_RUN_E2E=1 cargo test --test fedora_image -- --nocapture
    QLEAN_RUN_E2E=1 cargo test --test arch_image -- --nocapture
  2. Please do not spend further effort on WSL. This is simply not a core scenario for Qlean.

@jjl9807 jjl9807 marked this pull request as draft February 24, 2026 09:10
@jjl9807
Copy link
Collaborator

jjl9807 commented Feb 24, 2026

Based on the logs you uploaded, it appears the Fedora and Arch code hasn't undergone any actual verification; both fedora_image and arch_image tests were skipped. All tests must run through the complete workflow. Do not skip them. If there are issues, they should fail explicitly. A false pass is meaningless.

Additionally, libguestfs-tools runs perfectly fine on WSL. Please devote the necessary time and effort to resolving your local environment issues; there is no need to specifically detect the WSL environment. Furthermore, KVM unavailability is not a valid reason to skip tests. Qlean already supports automatically switching to the TCG engine when KVM is unavailable.

@userhaptop userhaptop marked this pull request as ready for review February 24, 2026 14:14
… VM boot/ssh

Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b05d4bd06

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@genedna genedna requested a review from jjl9807 February 25, 2026 08:33
@jjl9807 jjl9807 marked this pull request as draft February 27, 2026 10:41
Copy link
Collaborator

@jjl9807 jjl9807 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026-02-24T22:10:20.076056856+08:00 WARN qlean::qemu: Kernel/initrd extraction is unavailable. Booting from qcow2 disk image directly.

Please fix the libguestfs-tools issue properly and remove the fallback mechanism. This PR will not be considered for merging until this is resolved.

Furthermore, understand that the reason previous reviews only highlighted one or two issues was not because those were the only problems. It was because listing every single flaw would have been futile, given that my feedback is consistently ignored.

There are fundamental issues with at least the tests and network implementations in this PR. Do not attempt to mark this PR as "Ready" again. It is misleading and deceives no one.

Finally, address the problems raised in the review directly. Stop adding arbitrary fallback mechanisms that undermine the project architecture.

@userhaptop
Copy link
Contributor Author

Got it. I’ll remove the qcow2 direct-boot fallback, make Fedora/Arch fail explicitly when kernel/initrd extraction cannot complete through the libguestfs/guestfish path, and remove the TCP/user-mode fallback if the intended networking path for Qlean here is bridge + vsock only. Please confirm that, for this PR, Fedora/Arch E2E is expected to require successful local extraction and that the expected network path is bridge + vsock only.

…e full E2E flow (build image → boot VM → SSH → run checks → shutdown) with no skips, no WSL special-casing, and no “KVM missing → skip” (falls back to QEMU TCG).

Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
@userhaptop
Copy link
Contributor Author

userhaptop commented Mar 8, 2026

This commit removes the runtime libguestfs fallback paths (no more automatic fixed-appliance, supermin, or kernel-package provisioning) and eliminates the network-layer TCP / user-mode fallback, so Qlean now depends on a correctly prepared host libguestfs-tools, bridge, and vsock setup while running the Ubuntu / Fedora / Arch E2E flows end-to-end under QLEAN_RUN_E2E=1, failing explicitly when prerequisites are not met.

The only remaining fallback behavior is limited to the download layer (mirror retry / mirror fallback) and SSH readiness retry when vsock is not yet ready; these do not change the project architecture and do not mask failures as passing tests.

Thank you for pointing out the issues in the previous revisions. I appreciate the detailed review, and please continue to point out anything that is still incorrect or inconsistent with the project’s architecture or expectations.

@userhaptop
Copy link
Contributor Author

my_user@LAPTOP-30HORNHG:~/qlean$ sudo bash scripts/setup-host-prereqs.sh
...
ibguestfs: sending SIGTERM to process 10439
libguestfs: qemu maxrss 233596K
libguestfs: closing guestfs handle 0x5d11a464fa80 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsGdwblE
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfssdPcf4
===== TEST FINISHED OK =====
DONE

my_user@LAPTOP-30HORNHG:~/qlean$ QLEAN_RUN_E2E=1 cargo test --test ubuntu_image -- --nocapture
QLEAN_RUN_E2E=1 cargo test --test fedora_image -- --nocapture
QLEAN_RUN_E2E=1 cargo test --test arch_image -- --nocapture
Finished test profile [unoptimized + debuginfo] target(s) in 0.25s
Running tests/ubuntu_image.rs (target/debug/deps/ubuntu_image-879972d822a601b8)

running 1 test
INFO: host checks passed
INFO: creating image
INFO: image ready: /home/my_user/.local/share/qlean/images/ubuntu-noble-cloudimg/ubuntu-noble-cloudimg.qcow2
INFO: starting VM and waiting for SSH
2026-03-08T18:07:50.153305992+08:00 INFO qlean::machine: 🚀 Initializing VM-0b91c8f5d349
2026-03-08T18:07:50.154283229+08:00 INFO qlean::qemu: Starting QEMU
2026-03-08T18:07:50.213021+08:00 INFO qlean::ssh: Connecting SSH via vsock cid=10 port=22 as root (budget 12s)
2026-03-08T18:07:50.214624003+08:00 INFO qlean::ssh: Connecting SSH via vsock
2026-03-08T18:08:02.290285778+08:00 WARN qlean::ssh: Timeout while connecting to VM via vsock
2026-03-08T18:08:02.290487942+08:00 WARN qlean::ssh: SSH via vsock not ready yet: Timeout while connecting to VM via vsock.
Hint: Qlean uses vhost-vsock for SSH. Ensure /dev/vhost-vsock exists and the hypervisor provides a working vsock path.
2026-03-08T18:08:01.817597127+08:00 INFO qlean::ssh: Connecting SSH via vsock cid=10 port=22 as root (budget 12s)
2026-03-08T18:08:01.81771192+08:00 INFO qlean::ssh: Connecting SSH via vsock
2026-03-08T18:08:09.074067302+08:00 INFO qlean::ssh: ✅ Connected via vsock as root
2026-03-08T18:08:09.88009977+08:00 INFO qlean::machine: 🧬 Executing command . /etc/os-release && echo $ID on VM-0b91c8f5d349
2026-03-08T18:08:09.884663068+08:00 INFO qlean::machine: 🧬 Executing command id -u on VM-0b91c8f5d349
2026-03-08T18:08:09.919903066+08:00 INFO qlean::machine: 🔌 Shutting down VM-0b91c8f5d349
2026-03-08T18:08:13.875106625+08:00 INFO qlean::qemu: ⏏️ Process 10769 exited as expected
test test_ubuntu_image_creation ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 34.98s

Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
 Running tests/fedora_image.rs (target/debug/deps/fedora_image-98bd5967d14ab249)

running 1 test
INFO: host checks passed
INFO: creating image
INFO: image ready: /home/my_user/.local/share/qlean/images/fedora-cloud/fedora-cloud.qcow2
INFO: starting VM and waiting for SSH
2026-03-08T18:08:22.869697021+08:00 INFO qlean::machine: 🚀 Initializing VM-095dfdab072b
2026-03-08T18:08:22.870155381+08:00 INFO qlean::qemu: Starting QEMU
2026-03-08T18:08:22.924384826+08:00 INFO qlean::ssh: Connecting SSH via vsock cid=10 port=22 as root (budget 12s)
2026-03-08T18:08:22.924557538+08:00 INFO qlean::ssh: Connecting SSH via vsock
2026-03-08T18:08:34.31522103+08:00 WARN qlean::ssh: Timeout establishing SSH over vsock
2026-03-08T18:08:34.315333197+08:00 WARN qlean::ssh: SSH via vsock not ready yet: Timeout establishing SSH handshake over vsock.
Hint: Ensure the guest exposes SSH over vsock (Qlean configures a vsock->TCP proxy via cloud-init) and that the VM finished booting.
2026-03-08T18:08:34.56611153+08:00 INFO qlean::ssh: Connecting SSH via vsock cid=10 port=22 as root (budget 12s)
2026-03-08T18:08:34.566256759+08:00 INFO qlean::ssh: Connecting SSH via vsock
2026-03-08T18:08:38.88777568+08:00 INFO qlean::ssh: ✅ Connected via vsock as root
2026-03-08T18:08:39.634915559+08:00 INFO qlean::machine: 🧬 Executing command . /etc/os-release && echo $ID on VM-095dfdab072b
2026-03-08T18:08:39.665910939+08:00 INFO qlean::machine: 🧬 Executing command id -u on VM-095dfdab072b
2026-03-08T18:08:39.78029609+08:00 INFO qlean::machine: 🔌 Shutting down VM-095dfdab072b
2026-03-08T18:08:41.277645555+08:00 INFO qlean::qemu: ⏏️ Process 11122 exited as expected
test test_fedora_image_startup_flow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 27.83s

Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
 Running tests/arch_image.rs (target/debug/deps/arch_image-61830ce79c97f01e)

running 1 test
INFO: host checks passed
INFO: creating image
INFO: image ready: /home/my_user/.local/share/qlean/images/arch-cloudimg/arch-cloudimg.qcow2
INFO: starting VM and waiting for SSH
2026-03-08T18:08:49.835935384+08:00 INFO qlean::machine: 🚀 Initializing VM-87904b239663
2026-03-08T18:08:49.836404147+08:00 INFO qlean::qemu: Starting QEMU
2026-03-08T18:08:49.89010724+08:00 INFO qlean::ssh: Connecting SSH via vsock cid=10 port=22 as root (budget 12s)
2026-03-08T18:08:49.890235567+08:00 INFO qlean::ssh: Connecting SSH via vsock
2026-03-08T18:09:00.316608635+08:00 INFO qlean::ssh: ✅ Connected via vsock as root
2026-03-08T18:09:03.390533374+08:00 INFO qlean::machine: 🧬 Executing command . /etc/os-release && echo $ID on VM-87904b239663
2026-03-08T18:09:03.400068851+08:00 INFO qlean::machine: 🧬 Executing command id -u on VM-87904b239663
2026-03-08T18:09:03.455975286+08:00 INFO qlean::machine: 🔌 Shutting down VM-87904b239663
2026-03-08T18:09:05.255394271+08:00 INFO qlean::qemu: ⏏️ Process 11393 exited as expected
test test_arch_image_startup_flow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.41s

Signed-off-by: userhaptop <1307305157@qq.com>
@userhaptop
Copy link
Contributor Author

I have unified hash verification, default end-to-end encryption, and the official mirror source based on TOML. The download process is also basically unified to a single path, and I no longer use multiple mirrors for fallback and have removed most of the probing logic.
I appreciate the detailed review, and please continue to point out anything that is still incorrect or inconsistent with the project’s architecture or expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants