Skip to content

Experimental OpenMP support#81

Draft
rubenhorn wants to merge 10 commits intookkevaneck:developfrom
rubenhorn:experimental/depth_first_bnb_parallel
Draft

Experimental OpenMP support#81
rubenhorn wants to merge 10 commits intookkevaneck:developfrom
rubenhorn:experimental/depth_first_bnb_parallel

Conversation

@rubenhorn
Copy link
Contributor

I implemented the naive approach mentioned in #70 using OpenMP.

Unit tests exist, but the scaling behavior and parallel efficiency of this implementation for different instances have not yet been investigated.
⚠️ This algorithm evaluates more solutions than the serial version, because the enumerated subtrees cannot (yet) use information about the global optimum from each other.
Therefore, this implementation is more geared towards leveraging parallel CPU architectures to achieve real world speed-ups over fewer evaluations.

A note on the checkpoint feature implemented in #77...

This feature is currently not supported in the parallel implementation, because each thread will use the same checkpoint file and the order and assignment of loop indices is not deterministic.
One option is to just disable check pointing for the parallel version of the algorithm:

const char* varName = "PROSPR_CACHE_DIR";
const char* oldValue = std::getenv(varName);
bool existed = (oldValue != nullptr);
std::string backup;
if (existed) {
   backup = oldValue;
   std::cerr << "[Warning] Checkpointing is not supported with OpenMP.\n";
#ifdef _WIN32
   _putenv_s(varName, "");
#else
   unsetenv(varName);
#endif
}
/* Solve instance here. */
if (existed) {
#ifdef _WIN32
   _putenv_s(varName, backup.c_str());
#else
   setenv(varName, backup.c_str(), 1);
#endif
} 

However, this is a very sloppy way of dealing with it and potentially dangerous when the caller is using multiple threads to solve multiple instances, since the environment is shared.
A better idea would be to re-write the parallel for-loop to use the master-worker pattern using OpenMP tasks. Then, only the current optimum and the remaining subtrees (identified by their index) need to be stored. Any progress on subtrees which are currently being solved is lost. Upon restart, the subtrees are enumerated again and the number of previously solved subtrees are removed from the start of the task queue.
I guess depth_first_bnb still needs to know not to load/create checkpoints, but this could be handled using another parameter or modified semantics of bool is_pre_folded.

MPI and multi-node parallelism

I think MPI is not a good fit here, because it has some disadvantages.

  1. The small number of computations compared to the number of nodes in the tree result in excessive communication/synchronization overhead. I think it's very likely that the practical scalability is therefore rather limited and a single node will suffice.
  2. Shared memory parallelism (as opposed to distributed memory in MPI) presents some opportunities for nice optimization tricks. As noted in this comment, by sharing the global optimum across all threads, more aggressive pruning (close to that of the serial algorithm) could be achieved.

TODOs

@rubenhorn rubenhorn changed the base branch from master to develop November 9, 2025 21:04
@rubenhorn rubenhorn marked this pull request as ready for review November 12, 2025 19:50
@rubenhorn rubenhorn marked this pull request as draft November 12, 2025 19:50
}
float targetSubtreeCount = (float)(workerCount) * work_ratio;
/* Each node has up to 3 child nodes -> determine closest depth to match work_ratio */
size_t pre_fold_depth = (size_t)std::max(0ll, llround(log(targetSubtreeCount) / log(3.0)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong! Each node has dim+dim-1 children. (Replace hardcoded 3.0!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant