Skip to content

lifecycle: Add exception handling for CPU affinity#113

Open
SangKyeong-Jeong wants to merge 4 commits intoeclipse-score:mainfrom
SangKyeong-Jeong:lifecycle/bugfix/affinity-exception-handling
Open

lifecycle: Add exception handling for CPU affinity#113
SangKyeong-Jeong wants to merge 4 commits intoeclipse-score:mainfrom
SangKyeong-Jeong:lifecycle/bugfix/affinity-exception-handling

Conversation

@SangKyeong-Jeong
Copy link
Contributor

When there are 32 CPU cores, an issue arises where the process cannot be executed, and when there are more than 32 cores, a problem occurs where fewer cores than expected are used. To resolve this issue, exception handling is added to set the maximum number of cores using the uint32_t type when there are more than 32 cores.

@github-actions
Copy link

github-actions bot commented Mar 11, 2026

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //:license-check

Status: ⚠️ Needs Review

Click to expand output
[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.4.2) and connecting to it...
INFO: Invocation ID: 58dd8f8e-8d14-420b-ad51-aaa82121083f
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
WARNING: For repository 'score_rust_policies', the root module requires module version score_rust_policies@0.0.3, but got score_rust_policies@0.0.5 in the resolved dependency graph. Please update the version in your MODULE.bazel or set --check_direct_dependencies=off
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //:license-check (53 packages loaded, 9 targets configured)

Analyzing: target //:license-check (109 packages loaded, 39 targets configured)

Analyzing: target //:license-check (152 packages loaded, 2669 targets configured)

Analyzing: target //:license-check (154 packages loaded, 4255 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7889 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

INFO: Analyzed target //:license-check (170 packages loaded, 10039 targets configured).
[13 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox
[15 / 16] [Sched] Building license.check.license_check.jar ()
INFO: Found 1 target...
Target //:license.check.license_check up-to-date:
  bazel-bin/license.check.license_check
  bazel-bin/license.check.license_check.jar
INFO: Elapsed time: 28.252s, Critical Path: 2.38s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/license.check.license_check ./formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

@github-actions
Copy link

The created documentation from the pull request is available at: docu-html

@FScholPer
Copy link
Contributor

@SangKyeong-Jeong whats the status here can we review it

Copy link
Contributor

@FScholPer FScholPer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check also the OS Abstraction Layer

@daeyoung-jeong-lge
Copy link

@FScholPer We will double-check the OSAL implementation files (specifically under src/internal/osal/linux and qnx) to ensure the CPU core limit is handled consistently across different platforms. We'll update the PR if any adjustments are needed.
And when we are ready, is it OK for us to change the status of this issue to "Ready for review"?

@SangKyeong-Jeong
Copy link
Contributor Author

@FScholPer

This issue is unrelated to the OSAL.

When the CPU core count is 32, the return value of the ConfigurationManager::kDefaultProcessorAffinityMask() function is 0, causing an error when setting the CPU affinity to 0.

Shifting a 32-bit integer by 32 bits results in a shift-count-overflow error, which leads to undefined behavior. Consequently, the return value of ConfigurationManager::kDefaultProcessorAffinityMask() becomes 0, causing this issue.

@SangKyeong-Jeong SangKyeong-Jeong marked this pull request as ready for review March 12, 2026 01:02
@daeyoung-jeong-lge
Copy link

@FScholPer
As SangKyeong said, regarding to OSAL, this issue is not related. If we find some issue on OSAL, we will handle it as the separated issue and PR.

Please review this. Thanks.

Shifting a 32-bit number by 32 bits causes undefined behavior,
so modify it to perform the shift operation on a 64-bit number
and then cast it.
@daeyoung-jeong-lge
Copy link

We encounter the Formatting checks fail. It looks like the temporal download connection problem, so I want to re-run the batch, but maybe don't have a permission to do.

Copy link
Contributor

@pawelrutkaq pawelrutkaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase branch

const uint32_t ConfigurationManager::kDefaultProcessExecutionError = 1U;
uint32_t ConfigurationManager::kDefaultProcessorAffinityMask() {
return (1U << osal::getNumCores()) - 1U;
return static_cast<uint32_t>((1ULL << osal::getNumCores()) - 1ULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are 64 cores, is there still the issue or ? ;) In general, this affinity in LM supports only up to 32 cores, since later on cpu_mask_ also does not support more. So either we do an assert is there is more than 32 cores, or we do wanirng logs that default affinity is set only to first 32 cores ;)

Copy link

@daeyoung-jeong-lge daeyoung-jeong-lge Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR addresses the Undefined Behavior on 32-core systems. As chungsky mentioned, getNumCores() already caps the count at 32, so >32 core systems are safe with this default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what i am saying that we shall probably add at least a warning (since you are already fixing issues around it) that it was capped somehwere, since defult behaviour for ie 32 and 33 cores is completely different. @NicolasFussberger what do you think ?

Copy link
Contributor

@NicolasFussberger NicolasFussberger Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, logging a warning could be helpful.
However, that might not be trivial. Probably you do not want to log the warning again and again every time osal::getNumCores() is called but only once during startup. So I think it would require refactoring getNumCores() method or adding another method for validation purposes.

The behaviour in case of > 32 cores seems reasonable to me. LaunchManager will just use the first 32 cores.

We could also add support for 64 cores if that is required, but that is probably a separate task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a short investigation it looks like we are hitting an undefined behavior here...
For reference pls have look at C Standard (C18, ISO/IEC 9899:2018) section 6.5.7 Bitwise shift operators and C++ Standard (C++20, ISO/IEC 14882:2020) section 7.6.2.2 Bitwise shift operators [expr.shift].

If the value of the right operand is negative, or greater than or equal to the number of bits in the promoted left operand, the behavior is undefined.

IMHO it looks to me that hardware is not really doing the logical thing here. Please consider following example:

#include <cstdint>
#include <iostream>

void shiftLeftTest(std::uint32_t num) {
    std::uint32_t first_result = 1U << num;
    std::uint32_t second_result = 1U << 32;

    std::cout << "Shifting left 1U by " << num << std::endl;
    std::cout << "first_result --> " << first_result << std::endl;
    std::cout << "second_result --> " << second_result << std::endl;
}

int main() {
    shiftLeftTest(32);
}

This code will print following text:

Shifting left 1U by 32
first_result --> 1
second_result --> 0

So essentially compiler will calculate something different than hardware.
For this reason maybe we should go for the following code that especially address this case. This may be handy if we support 64 cores in the future.

uint32_t ConfigurationManager::kDefaultProcessorAffinityMask() {
    constexpr uint32_t BITS_PER_BYTE = 8;
    uint32_t bitMask = 0U;
    uint32_t cores = osal::getNumCores();

    uint32_t maskSize = sizeof(bitMask) * BITS_PER_BYTE;

    return ( cores >= maskSizeInBits ? -1U : (1U << cores) - 1U );
}

Any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, true. For me fine. @SimonKozik / @NicolasFussberger can we create an gtihub issue to support 64 cores and/or provide a warning indication to the user if there is still more core ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To resolve the Undefined Behavior issue that occurs with 32 cores, is there any problem with the method I applied in the patch, which performs the shift operation using a 64-bit integer (1ULL) and then casts it?

Since the return type of the kDefaultProcessorAffinityMask function is uint32_t, returning -1U seems awkward because it applies a negative sign to an unsigned type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that 32 cores is a bit hard coded now and changing the maximum to 64, will require some changes in few files. For this reason it will be hard to argue, that we should make this function very future proof.

May main concern is the fact, that the change proposed in this PR is just masking the problem. Not really solving it. So if we in the future increase the mask size to 64-bit, your code will step into the same problem. Am I right here?

I will argue that we should document the root problem we are facing here. This probably will be in the form of a comment.
Apart of that we should have fix.
Fix can be in the form proposed inside my comment, or the fix can be as proposed in PR. But if we are going to go with 1ULL then we should document that we are avoiding the undefined behavior by increasing the size of the type on which we are performing calculation.

I just want to avoid stepping into the same problem when we eventually provide support for 64 cores.

PS.
A big plus of using 1ULL is the fact that we can avoid branch in the calculation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if the mask size is increased to 64 bits, we may encounter the same problem again. Currently, the maximum value is limited to 32 in the osal::getNumCores() function, which is why I made this change.
It also seems important to document this so that the issue does not occur again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a ticket here #122 to support 64 cores, linking to the proposal #113 (comment) for bitshifting

@SangKyeong-Jeong
Copy link
Contributor Author

Done rebasing

@FScholPer
Copy link
Contributor

@pawelrutkaq can you please also check again

@FScholPer FScholPer deployed to workflow-approval March 13, 2026 15:09 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants