Draft
Conversation
rubenhorn
commented
Jan 21, 2026
| } | ||
| float targetSubtreeCount = (float)(workerCount) * work_ratio; | ||
| /* Each node has up to 3 child nodes -> determine closest depth to match work_ratio */ | ||
| size_t pre_fold_depth = (size_t)std::max(0ll, llround(log(targetSubtreeCount) / log(3.0))); |
Contributor
Author
There was a problem hiding this comment.
Wrong! Each node has dim+dim-1 children. (Replace hardcoded 3.0!)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I implemented the naive approach mentioned in #70 using OpenMP.
Unit tests exist, but the scaling behavior and parallel efficiency of this implementation for different instances have not yet been investigated.
⚠️ This algorithm evaluates more solutions than the serial version, because the enumerated subtrees cannot (yet) use information about the global optimum from each other.
Therefore, this implementation is more geared towards leveraging parallel CPU architectures to achieve real world speed-ups over fewer evaluations.
A note on the checkpoint feature implemented in #77...
This feature is currently not supported in the parallel implementation, because each thread will use the same checkpoint file and the order and assignment of loop indices is not deterministic.
One option is to just disable check pointing for the parallel version of the algorithm:
However, this is a very sloppy way of dealing with it and potentially dangerous when the caller is using multiple threads to solve multiple instances, since the environment is shared.
A better idea would be to re-write the parallel
for-loop to use the master-worker pattern using OpenMP tasks. Then, only the current optimum and the remaining subtrees (identified by their index) need to be stored. Any progress on subtrees which are currently being solved is lost. Upon restart, the subtrees are enumerated again and the number of previously solved subtrees are removed from the start of the task queue.I guess
depth_first_bnbstill needs to know not to load/create checkpoints, but this could be handled using another parameter or modified semantics ofbool is_pre_folded.MPI and multi-node parallelism
I think MPI is not a good fit here, because it has some disadvantages.
TODOs