Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Oct 17, 2025

Re-structures support bundle collection into tasks.

This centralizes step dispatching, and makes it slightly more clear that tasks are independent (where they can be).

Comment on lines +717 to +718
let mut tasks =
ParallelTaskSet::new_with_parallelism(MAX_CONCURRENT_STEPS);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we had some ParallelTaskSets embedded within the collection of sub-pieces of the bundle.

This new "step-based" infrastructure shares that more broadly - everything should be using this one ParallelTaskSet (which is good? that prevents a task that spawns a bunch of other task set, that spawns more task sets - everything is just a unit of work that can get added to this one set).


let ereport_collection = if let Some(ref ereport_filters) =
self.request.ereport_query
async fn collect_host_ereports(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior mechanism of collecting ereports from {host, sp} did a bit of manual task management, and stored atomics within the bundle itself.

I've just made each part of the ereport collection a distinct step: no more atomics, no more manual tokio tasks - each one is just "one step" with output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes sense -- I had just wanted to make sure the (theoretically, totally separate) operations for host OS and SP ereports could execute in parallel. Since that's now a first-class aspect of the overall design, all that nonsense can be removed!

Comment on lines +887 to +914
async fn get_or_initialize_mgs_client<'a>(
&self,
mgs_client: &'a OnceCell<Arc<Option<MgsClient>>>,
) -> &'a Arc<Option<MgsClient>> {
mgs_client
.get_or_init(|| async {
Arc::new(self.create_mgs_client().await.ok())
})
.await
}

const MAX_CONCURRENT_SLED_REQUESTS: usize = 16;
const FAILURE_MESSAGE: &str =
"Failed to fully collect support bundle info from sled";
let mut set = ParallelTaskSet::new_with_parallelism(
MAX_CONCURRENT_SLED_REQUESTS,
async fn get_or_initialize_all_sleds<'a>(
&self,
all_sleds: &'a OnceCell<Arc<Option<Vec<Sled>>>>,
) -> &'a Arc<Option<Vec<Sled>>> {
all_sleds
.get_or_init(|| async {
Arc::new(
self.datastore
.sled_list_all_batched(
&self.opctx,
SledFilter::InService,
)
.await
.ok(),
)
})
.await
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using lazily-initialized variables for "data/clients that might get created as a part of bundle collection, but might not".

This becomes more relevant with #9254, when we may or may not need these values to get initialized at all.


let mut extra_steps: Vec<(&'static str, CollectionStepFn)> = vec![];
for sp in get_available_sps(&mgs_client).await? {
extra_steps.push((
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than collecting all SP reports via new tokio tasks, these are creating new "steps" that get kicked back to the top-level ParallelTaskSet I mentioned earlier.

@smklein smklein requested review from hawkw and wfchandler October 17, 2025 23:49
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like this design. Most of my notes were pretty small, but I did wonder if we might want to use a channel for the queue of new tasks to be spawned.

Comment on lines 720 to 757
loop {
// Process all the currently-planned steps
while let Some((step_name, step)) = steps.pop() {
let previous_result = tasks.spawn({
let collection = self.clone();
let dir = output.path().to_path_buf();
async move {
debug!(collection.log, "Running step"; "name" => &step_name);
step(&collection, dir.as_path()).await.inspect_err(|err| {
warn!(
collection.log,
"Step failed";
"name" => &step_name,
InlineErrorChain::new(err.as_ref()),
);
})
}
}).await;

if let Some(Ok(output)) = previous_result {
output.process(&mut report, &mut steps);
};
}

// If we've run out of tasks to spawn, join all the existing steps.
while let Some(previous_result) = tasks.join_next().await {
if let Ok(output) = previous_result {
output.process(&mut report, &mut steps);
};
}

// Executing steps may create additional steps, as follow-up work.
//
// Only finish if we've exhausted all possible steps and joined all spawned work.
if steps.is_empty() {
return report;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder if we might want to consider making steps a MPSC channel that each step has the Sender side cloned into, instead of having the output of the step future return a list of additional steps. That way, new tasks can be spawned as soon as there's space on the task set, instead of having to wait for the second join_next() loop to wait for all the currently running tasks to complete before spawning any new steps. With the current design, there's some additional latency while we're in the second loop that could be avoided this way.

Also, we end up with a potentially big vec sitting around in memory, which could be avoided if we spawned the tasks as soon as there was room for them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I sat with this for a little bit. You're right about the second join_next loop causing an unnecessary latency bubble - but I think this can be fixed (as I did in 70f0bf1) with a continue statement, that "tries to spawn more tasks" as soon as anything gets joined here.

I'm less sure about the benefits of using an mpsc here: with a channel, each task would send a task of work they want executed, and the "runner" would need to be recv-ing on this channel, spawning all the recv'd work.

But I think (and could be wrong!) there's no need to add this extra layer of indirection: since the runner can absorb all the output of individual tasks anyway, we either have:

  • a queue of work in the mpsc - filled by tasks, and which would need to be polled asynchronously by the task runner, OR
  • a queue of work in a vec (called steps) - filled by pushing onto it with task output, and which can be checked with synchronous code (as we're doing in this PR)

Since we're asynchronously joining tasks anyway, it seems unnecessary to "asynchronously join a task, and separately asynchronously recv a value it might want to send" -- we could just do both together? But I think there's going to be a pending in-memory structure either way: either a vec of steps, or an mpsc filled with pending work to be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right that just adding a continue in the second non-spawning join_next loop also solves the little latency bubble. So that's fine too.

Regarding

Since we're asynchronously joining tasks anyway, it seems unnecessary to "asynchronously join a task, and separately asynchronously recv a value it might want to send" -- we could just do both together? But I think there's going to be a pending in-memory structure either way: either a vec of steps, or an mpsc filled with pending work to be done.

One note on this subject is that a Vec/VecDeque will retain an allocation of O(max number of tasks that were ever queued) for as long as it exists, unless it's explicitly shrink_to_fit'd and does a big memcpy in order to copy into a smaller allocation. The Tokio MPSC channel (as we became uncomfortably familiar with last week) is backed by a linked list of chunks, and IIRC, it will deallocate some of those chunks as messages are received from the queue, so it's a bit nicer about releasing allocated memory. Whether or that matters at all depends on how long the support bundle collection lives for and the peak number of queued tasks...so basically this probably doesn't matter I bet.

Copy link
Member

@hawkw hawkw Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one other potential advantage of a MPSC is that one task can spawn any number of new tasks at any point in its lifespan by sending them over the MPSC one at a time, instead of returning a Vec of new work to spawn when it completes.


let ereport_collection = if let Some(ref ereport_filters) =
self.request.ereport_query
async fn collect_host_ereports(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes sense -- I had just wanted to make sure the (theoretically, totally separate) operations for host OS and SP ereports could execute in parallel. Since that's now a first-class aspect of the overall design, all that nonsense can be removed!

) -> anyhow::Result<CollectionStepOutput> {
save_sp_dumps(mgs_client, sp, dir)
.await
.with_context(|| format!("SP {} {}", sp.type_, sp.slot))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this error get formatted in a way that makes it clear it was dump collection related?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to make it more clear in 70f0bf1

Copy link
Contributor

@wfchandler wfchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns beyond what Eliza has already noted. A great cleanup, thanks!

HostEreports(SupportBundleEreportStatus),
SpEreports(SupportBundleEreportStatus),
SavingSpDumps { listed_sps: bool },
// NOTE: The ditinction between this and "Spawn" is pretty artificial -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo

Suggested change
// NOTE: The ditinction between this and "Spawn" is pretty artificial -
// NOTE: The distinction between this and "Spawn" is pretty artificial -

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 70f0bf1

@wfchandler
Copy link
Contributor

Nothing further from me, thanks!

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, with a couple last nitpicks

Comment on lines +830 to +842

// As soon as any task completes, see if we can spawn more work
// immediately. This ensures that the ParallelTaskSet is
// saturated as much as it can be.
continue;
}

// Executing steps may create additional steps, as follow-up work.
//
// Only finish if we've exhausted all possible steps and joined all spawned work.
if steps.is_empty() {
return report;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, actually, doesn't this simplify to

Suggested change
// As soon as any task completes, see if we can spawn more work
// immediately. This ensures that the ParallelTaskSet is
// saturated as much as it can be.
continue;
}
// Executing steps may create additional steps, as follow-up work.
//
// Only finish if we've exhausted all possible steps and joined all spawned work.
if steps.is_empty() {
return report;
}
// As soon as any task completes, see if we can spawn more work
// immediately. This ensures that the ParallelTaskSet is
// saturated as much as it can be.
continue;
} else if steps.is_empty() {
// Executing steps may create additional steps, as follow-up work.
//
// Only finish if we've exhausted all possible steps and joined all
// spawned work.
return report;
}

Err((n_collected, err)) => {
warn!(
&self.log,
"Support bundle: sp ereport collection failed \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit:

Suggested change
"Support bundle: sp ereport collection failed \
"Support bundle: SP ereport collection failed \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants