Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Nov 12, 2025

This PR is upgrading our patchset onto the latest Cloud Hypervisor release v49, following our process:

  • cherry-pick commits step by step from the latest gardenlinux branch
  • in situations where we had a -> b -> c -> fix a --> a + fix a -> b -> c
    • I squashed,
    • and reordered some commits
  • no functional changes or additions; only what was necessary to get our existing code work
  • I cherry-picked a few selected commits from upstream that affect CI and/or developer productivity
  • I rephrased a few commits to satisfy our commit lint tool

The PR should not be merged. Once we are happy with it, we close the PR and rename the branch to gardenlinux, and the old gardenlinux branch will be renamed to gardenlinux-v48 via the GitHub web UI.

Steps

  • wait until feature-freeze (we want the CPU profiles in here)
  • libvirt-tests pass
  • three approvals

@phip1611 phip1611 changed the base branch from gardenlinux to next-gardenlinux-v49-base November 12, 2025 18:04
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch 2 times, most recently from 8704176 to 352fedb Compare November 13, 2025 09:42
@phip1611 phip1611 marked this pull request as draft November 13, 2025 10:14
Remove irrelevant/annoying CI here to accelerate development.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
To check gitlint locally, one can run:

gitlint --commits "HEAD~2..HEAD"

which for example checks the last two commits.

Although this is just our kinda private (but public) fork, people might
cherry-pick commits from us for whatever reason. So we should have
proper commit style.

On-behalf-of: SAP philipp.schuster@sap.com
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch from 352fedb to 52f1e86 Compare November 24, 2025 07:38
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch 2 times, most recently from aac56c1 to 0f95669 Compare December 3, 2025 11:55
TL;DR: Fix for long rebuilds locally when testing things.

The release profile is optimized for maximum performance,
sacrificing build speed. As local development and testing requires
frequent rebuilds, but the dev profile is way too slow for
"real testing", this profile is a sweet spot and helps to
investigate things.

Instead of `cargo run --release`, one can now run
`cargo run --profile optimized-dev`.

# Measurements

Measurements were done using
`$ [cargo clean;] time cargo build --profile release|optimized-dev` and
rustc 1.89. I've used the `time`-builtin from zsh.

Note that user time is much higher as we have more threads
(codegen units) now. The total time is much shorter, tho.

## Clean Build

Speedup of 56%.

- `$ time cargo clean --release`:
  `109,67s user 13,64s system 211% cpu 58,343 total`
- `$ time cargo clean --profile optimized-dev`:
  `185,41s user 14,92s system 528% cpu 37,876 total`

## Incremental Build

Speedup of 153%.

- `$ time cargo clean --release`:
  `37,58s user 1,53s system 117% cpu 33,356 total`
- `$ time cargo clean --profile optimized-dev`:
  `47,62s user 1,71s system 373% cpu 13,220 total`

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
With debug symbols, we will get better backtraces and can
improve our experience debugging. The only downside is larger
binary size which is negligible in our case. There are no
implications for the performance.

Stripped:   3.9M
Unstripped: 4.7M

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Cherry-picked three commits from upstream + squashed them.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch from 0f95669 to a184e07 Compare December 3, 2025 12:45
@phip1611 phip1611 self-assigned this Dec 3, 2025
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch from a184e07 to ad262ce Compare December 5, 2025 17:09
TL;DR: Massive quality of life improvement for devs

Cloud Hypervisor uses the Cargo test framework for multiple tests:

- normal unit tests
- unit tests requiring special environment (the Tap device tests)
- integration tests requiring a special environment

This prevented the execution of `cargo test --workspace`, which results
in a very poor developer experience. Although
`./scripts/run_unit_tests.sh` exists, there are valid reasons why devs
cannot or even don't want to use it.

By adding a new `chv_testenv` rustc config, we can conditionally only
activate tests when the `./scripts/` magic runs them. This improves
the general developer experience by a lot.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
A major improvement to the developer experience of clippy in
Cloud Hypervisor.

1. Make `cargo clippy` just work with the same lints we use in CI
2. Simplify adding new lints

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This is the first commit in a series of commits to improve the Code
Quality in Cloud Hypervisor in a sustainable way. These are the
default rules from `clippy::all` but written here to be more explicit.
`clippy::all` refers to all "default sensible" lints, not all
existing lints.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This is a list of squashed commits.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
@phip1611 phip1611 force-pushed the next-gardenlinux-v49 branch from 3fae3dd to 027aeca Compare December 17, 2025 08:46
Jinrong Liang and others added 9 commits December 17, 2025 10:58
# This is the 1st commit message:

vmm: pr cloud-hypervisor#7033 squashed 2025-08-18: downtime limits

Current (squashed) state of:
 https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7033/commits

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
---

vm-migration: Add support for downtime limits

Add handling of migration timeout failures to provide more flexible
live migration options. Implement downtime limiting logic to minimize
service disruptions. Support for setting downtime thresholds and
migration timeouts.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>

docs: Add migration parameters to live migration document

Updated live migration documentation to include migration timeout
controls and downtime limits.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>

tests: Add downtime and migration timeout tests

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>
Also see [0] for more info.

[0] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
No need to grab the lock multiple times in this short period
of time. The lock is anyway held for the duration of the long
operation (KVM_RUN).

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
These are the prerequisites for the upcoming (quick and dirty)
solution to the problem that we might miss some events.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
A common scenario for a VMM to regain control over the vCPU thread from
the hypervisor is to interrupt the vCPU. A use-case might be the `pause`
API call of CHV.

VMMs using KVM as hypervisor must use signals for this interception,
i.e., a thread sends a signal to the vCPU thread. Sending and handling
these signals is inherently racy because the signal sender does not know
if the receiving thread is currently in the RUN_VCPU [0] call, or
executing userspace VMM code.

If we are in kernel space in KVM_RUN, things are easy as KVM just exits
with -EINTR. For user-space this is more complicated. For example, it
might happen that we receive a signal but the vCPU thread was about to
go into the KVM_RUN system call as next instruction. There is no more
opportunity to check for any pending signal flag or similar.

KVM offers the `immediate_exit` flag [1] as part of the KVM_RUN
structure for that. The signal handler of a vCPU is supposed to set this
flag, to ensure that we do not miss any events. If the flag is set,
KVM_RUN will exit immediately [2].

We will miss signals to the vCPU if the vCPU thread is in userspace VMM
code and we do not use the `immediate_exit` flag.

We must have access to the KVM_RUN data structure when the signal
handler executes in a vCPU thread's context and set the
`immediate_exit` [1] flag. This way, the next invocation of KVM_RUN
exits immediately and the userspace VMM code can do the normal event
handling.

We must not use any shared locks between the normal vCPU thread VMM
code and the signal handler, as otherwise we might end up in deadlocks.

The signal handler therefore needs its dedicated mutable version of
KVM_RUN.

This commit introduces a (very hacky but good enough for a PoC) solution
to this problem.

[0] https://docs.kernel.org/virt/kvm/api.html#kvm-run
[1] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure
[2] https://elixir.bootlin.com/linux/v6.12/source/arch/x86/kvm/x86.c#L11566

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
auto-converge (vCPU throttling) is a technique combined with precopy
live-migration flows to migrate VMs with a high dirty rate
(high working set with many writes). It is an alternative to postcopy
migration, which is not yet implemented in Cloud Hypervisor.

By throttling the vCPUs incrementally, the dirty rate drops and the
VM migrates (converges) eventually. More specifically, the reduced
dirty rate ensures that the configured downtime can be reached.

The implementation is inspired by QEMU, but adapted to Cloud
Hypervisor. Various discussions, intermediate steps, and experiments
lead to this final result.

vCPU throttling was implemented with a dedicated thread and a
manager for that thread. This thread utilizes the CpuManager's
pause() and resume() in conjunction with (interruptible) sleeps
to apply the current throttling percentage onto the vCPUs, thus
the VM. The implementation is designed to not block or delay
normal operation any longer than necessary.

The proposed design relies on the recent improvements
and fixes for CpuManager's pause() and resume(). For correctness,
on each pause/resume cycle, the time for these actions is measured.
This way, a dynamic timeslice can be used, guaranteeing the VM
is indeed throttled at the indented percentage.

Although not supported yet by Cloud Hypervisor, this thread will
support throttling cancellation when live-migrations are cancelled.

This was intensively tested in an automated setup with thousands
of live-migrations with VMs under load.

- auto-converging starts always after two memory delta transfer
  iterations
- every two iterations, it is increased (step size is 10%)
- maximum throttling is 99%
- the VM will get slower. At 99% throttling, it will be unsurprisingly
  barely usable. This is something users have to accept if they want to
  migrate their VMs running heavy workloads.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Reviewed-by: Stefan Kober <stefan.kober@cyberus-technology.de>
Reviewed-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
Reviewed-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
In addition to configuration options like pty, file, tty, ... we allow
setting the serial device to be accessed via some open TCP port on the
host.

Signed-off-by: Stefan Kober <stefan.kober@cyberus-technology.de>
On-behalf-of: SAP stefan.kober@sap.com
Signed-off-by: Stefan Kober <stefan.kober@cyberus-technology.de>
On-behalf-of: SAP stefan.kober@sap.com
olivereanderson and others added 28 commits December 17, 2025 10:59
In order for guests to use AMX it is necessary to ask the kernel to
enable the related state components for guests. While cloud hypervisor
already does this, we would prefer to extract the logic into a stand
alone (reusable) function.

In this commit we only introduce the error type that will later be part
of the enable_amx_state_components function's signature.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We introduce a static enable_amx_state_components method on the
XSaveState struct that will be used in a follow up commit. We will
also extend the logic of what this method does in the near future.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
We use the central amx_state_components enabling function in the
CpuManager constructor. This way we can make changes to the AMX related
state components functionality without needing to update the
CpuManager's constructor.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP <oliver.anderson@sap.com>
We will need to query KVM for the size of the xsave struct in a follow
up commit. This commit introduces the necessary method on the
hypervisor trait to do that.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
AMX requires dynamically enabling certain large state components which
leads to an increase in the size of kvm_xsave. This was not taken into
account by Cloud hypervisor until now.

We solve this by refactoring `XSaveState` to directly wrap `kvm::Xsave`
and always ensure (via a OnceLocked static variable) that all operations
on the wrapped xsave state obtain an instance of the right size.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
When enabling amx feature, we should call arch_prctl to request
permission to use tile data for guest. The permission should be
requested before the first vcpu is created, so we need to call
arch_prctl in vmm thread. This patch adds the arch_prctl syscall for
vmm_thread_rules.

Fixes: cloud-hypervisor#7516

Signed-off-by: Songqian Li <sionli@tencent.com>
As we have replaced all KVM_GET_XSAVE calls with KVM_GET_XSAVE2
we need to update the seccomp filters accordingly.

Signed-Off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de>
On-behalf-of: SAP oliver.anderson@sap.com
Virtio PCI devices are created in a set of nested functions. In each
of this function a vector is created to add created devices to, only
to be appended to the vector of the higher nesting level. Those nested
vectors are unnecessary as we can directly write to the member of.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Allocating a device ID is crucial for assigning a specific ID to a
device. We need this to implement configurable PCI BDF.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Next to tests for `allocate_device_bdf`, we introduce a new constructor
`new_without_address_manager`, only available in the test build. As
there is no way to instantiate an `AddressManager` in the tests, we use
this constructor to work around this.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Updates all config structs in order to make the new config option
available to all PCI device. Additionally update the parser so the new
option becomes available on the CLI.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
We use `VecDeque` to sort devices implicitly. Devices whose config
contains a fixed BDF are added to the front, while those without a BDF
given are added to the back. Processing the `VecDeque` sequentially
from first to last then ensures that no clashes occur when assigning
BDFs to devices. Otherwise, we could end up in the case that we assigned
a BDF required by one device's config to one without a BDF.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
TLS connections have a TLS server (the endpoint that listens for a
connection) and a TLS client (the endpoint that initiates the
connection). This commit adds the code for the client side, which will
be the source host.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This is the TLS server side, which will be the live migration target.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
Also it seems like AsRawFd should be avoided
https://rust-lang.github.io/rfcs/3128-io-safety.html

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This allows (more or less) transparent usage of TLS encrypted TCP
connections.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
For TLS we need certificates (and a key for the TLS server). This
commits adds parameters for that and encrypts the connection with TLS if
the necessary parameters are provided.

Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
The ReadVolatile and WriteVolatile implementations of TlsStream were
very slow, mainly because they allocated a large buffer on each
invocation. The TlsStreamWrapper carries a buffer that it uses for
ReadVolatile and WriteVolatile and that is allocated once on creation.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This is the first commit of the cherry-pick from [0].

[0] cloud-hypervisor#7525

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This will also prevent some useless rebuilds. Using `--verbose` we can
observe that the build.rs causes frequent useless rebuilds - having
less is a good thing. They come from the dependency of `build.rs` to
the local git repository.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
TL;DR: cargo clippy|check|... now runs on whole workspace by default.

- add new workspace member `cloud-hypervisor`
- move `./src` to new workspace member
- move `./tests` to new workspace member
- move relevant parts from Cargo.toml to new workspace member
- kept necessary parts in main Cargo.toml, such as profile
  configurations

The main Cargo.toml historically mixes workspace and crate definitions
for cloud-hypervisor and ch-remote. This makes it hard to read and
requires `--workspace` to run cargo clippy or cargo test on all
workspace members, which is counter-intuitive.

This patch separates the workspace from the crate definition in the main
Cargo.toml file. After this, cargo clippy, cargo test, etc., work on the
whole workspace naturally, giving a smoother developer experience. The
Cargo.toml without a package definition is also called a virtual
workspace or virtual manifest by Cargo [0].

Backporting is not a concern: CHV no longer backports, but the affected
files are rarely modified anyway.

[0] https://doc.rust-lang.org/cargo/reference/workspaces.html#virtual-workspace

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Unrelated but necessary to also always format code for all
architectures and all features.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
It is no longer needed to add `--all` (which is an alias for
`--workspace`). The documentation says "Commands run in the workspace
root will run against all workspace members by default" [0].

We however still need `--tests` as this activates the building of
tests in `<crate>/tests` directories.

[0] https://doc.rust-lang.org/cargo/reference/workspaces.html#virtual-workspace

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This is the last commit of the cherry-pick from [1].

`cargo rustc` is incompatible with virtual manifests, so the CI needs to
 use cargo build instead. However, passing `RUSTFLAGS="-D warnings"` via
 the environment would propagate to all dependencies, and some of them
 currently fail to build under ``-D warnings` due to issues like [0]:

```
error: creating a mutable reference to mutable static
  --> src/temp.rs:97:5
   |
97 |     DIRS.pop()
   |     ^^^^^^^^^^ mutable reference to mutable static
```

To resolve this, apply ``-D warnings` only to the `cargo clippy`
commands (which apply to our workspace only) and avoid enforcing it for
the entire cargo build.

[0]: https://github.com/cloud-hypervisor/cloud-hypervisor/actions/runs/19962283528/job/57245376263?pr=7525
[1]: cloud-hypervisor#7525

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants