Move instruction profiling to a dedicated thread#526
Move instruction profiling to a dedicated thread#526mohitejaikumar wants to merge 3 commits intosolana-foundation:mainfrom
Conversation
|
@lgalabru I’d really appreciate your review and any feedback you may have. |
|
This is really awesome, @mohitejaikumar! I still haven't gotten to test yet so I'm not ready to sign-off. One concern I have - currently this code: let ix_profiles = match ix_profile_rx {
Some(rx) => tokio::task::block_in_place(|| rx.recv().ok().flatten()),
None => None,
};is happening directly after the transaction is processed for the real result. So in the original implementation, the an instruction profile for a transaction with 10 instructions executed 10 transactions: Your implementation slightly parallelizes by spawning a thread for 1-9, awaiting tx 10, then awaiting the thread: So this only shortens the number of txs the user is waiting on by 1. Can we slightly expand this approach. Rather than waiting on the instruction profiling thread to complete, we return the original result without ix profiles and append them later? Later, if the user fetches the profile result for a signature/uuid, we'll fetch whatever we have, which is likely to include the profile result. Thoughts @mohitejaikumar, @lgalabru? |
5419677 to
2129ad9
Compare
|
@mohitejaikumar Do you think you'll be able to address my comment? |
|
@MicaiahReid yes sir I wanna try it, giving it a try today, was bussy with some work. |
|
MicaiahReid
left a comment
There was a problem hiding this comment.
Awesome work @mohitejaikumar! This is really great. I've left some comments to get this over the finish line!
| } | ||
| }; | ||
|
|
||
| setup_profiling(&svm_locker); |
There was a problem hiding this comment.
Let's wrap this in an:
if simnet_config.instruction_profiling_enabled {
setup_profiling(&svm_locker);
}| let _ = profiling_job_tx.send(job); | ||
| Some(result_rx) | ||
| } | ||
| else { |
There was a problem hiding this comment.
If instruction profiling is enabled, and there is no profiling_job_tx, I assume this means there was a configuration error (setup_profiling wasn't called). Can we log a warning in this case?
The runloop file does call setup_profiling, so in the default case this won't ever be an issue, but if someone is trying to directly consume surfpool as an SDK and isn't using the runloop functions, it could be a helpful warning.
| KeyedProfileResult::new( | ||
| latest_absolute_slot, | ||
| UuidOrSignature::Signature(signature), | ||
| None, |
There was a problem hiding this comment.
So previously we would immediately return ix profiles, now that obviously isn't possible. But this None could maybe be misleading - like there is no ix profile at all, or profiling is disabled. What if the profile results were stored in an enum:
enum InstructionProfiles {
// ix profiling disabled, serializes to just `null` like the `None` variant of our option would
Unavailable,
// what we would return in this line if profiling is enabled - indicates to the user that results will be available later
Pending,
// what is filled in upon completion
Ready(Vec<ProfileResult>),
// what is filled in if profiling fails
Failed(String),
}We'd then have to have this serialized as such:
{
"instructionProfiles": null,
"instructionProfilesStatus": "unavailable"
}{
"instructionProfiles": null,
"instructionProfilesStatus": "pending"
}{
"instructionProfiles": [ ... ],
"instructionProfilesStatus": "ready"
}{
"instructionProfiles": null,
"instructionProfilesStatus": "failed: {error}"
}| pre_execution_capture: pre_execution_capture.clone(), | ||
| result_tx, | ||
| }; | ||
| let _ = profiling_job_tx.send(job); |
There was a problem hiding this comment.
So this .send is blocking. If our rx end of this gets filled up (bounded to 128 messages in the queue), this send will wait for the queue to free up a spot. This means the actual tx processing (which happens below) is being delayed.
Instead let's us try_send and handle errors. Something like:
match profiling_job_tx.try_send(job) {
Ok(()) => Some(result_rx),
Err(crossbeam_channel::TrySendError::Full(job)) => {
// log warning or emit event saying this transaction will not be profiled
None
}
Err(crossbeam_channel::TrySendError::Disconnected(job)) => {
// worker is gone; log an error and just disable ix profiling so we don't log again and again
None
}
}
Implements #320
start_profiling_runloopthat spawns a long-lived "Instruction Profiler" thread viahiro_system_kit::thread_named.ProfilingJobstruct - Packages all data needed for profiling (cloned SVM, transaction, accounts, etc.) to send across the thread boundary.Option<Sender<ProfilingJob>>toSurfnetSvm.fetch_all_tx_accounts_then_process_tx_returning_profile_resnow dispatches the profiling job to the dedicated thread before callingprocess_transaction_internal.setup_profilinghelper - Extracted profiling channel setup into a reusable function called from bothstart_local_surfnet_runloopandtests.ClonetoIndexedLoadedAddressesandTransactionLoadedAddressesto enable moving data into the profiling job.poll_for_instruction_profileshelper inhelpers.rsthat pollsget_profile_resultevery50msuntilinstruction_profilesisSomeor a10stimeout is reachedtest_profile_transaction_multi_instruction_basic,test_profile_transaction_token_transfer,test_profile_transaction_multi_instruction_failure,test_ix_profiling_with_alt_tx,test_instruction_profiling_does_not_mutate_state.