Skip to content

Conversation

@xuan-cao-swi
Copy link
Contributor

Description

This is a hot fix for 7.0.1 that restart the timer thread after fork.
Also add the thread lock that synchronize the token bucket variable changes.
Depreciated the config option :sample_rate (seems no longer used)

Test (if applicable)

@xuan-cao-swi xuan-cao-swi requested review from a team as code owners December 2, 2025 19:06
end

# Starts replenishing the bucket
def start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token bucket replenishment approach is different from JS/Python--it seems a timer thread on interval adds the token here, while the other two implementations "replenish during consume", meaning, at the time a token consume call is made, the interval between last consume and now is calculated, then the number of tokens that accrued during the interval is added.

Was there a reason you went with this different approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point is that given this difference, and if you intend to keep this different implementation, it does not make sense to set a MAX_INTERVAL--there should just be a DEFAULT_INTERVAL that is 1 second. The intention with JS/Python MAX_INTERVAL is to guard against the scenario where no consume is called for a long time, i.e. app sits there w/o receiving request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think apm-ruby approach is similar to apm-js.

apm-js seems also has this timer that run task, which add token with duration of interval. Although it's not a thread (e.g. all single thread in js), but I think setInterval function from js create an endless loop to keep executing the task function.

I checked the apm-python, it indeed recalculates during each consume and update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad, i just assumed the Python implementation is based on JS.

Question for @tammy-baylis-swi or @jerrytfleung now... do you remember the reason for Python implementation to not use a background thread to replenish tokens?

And @xuan-cao-swi how does the "reinit thread for forked child process" work in Ruby--not seeing anything obvious in this PR that corresponds to the Python register_at_fork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not seeing anything obvious in this PR

Right, there is no reinitialize the thread like reset_on_fork from http_sampler.
The update/update_from_hash will start the @timer thread after fork. Because after fork, the settings_request will start, and it will update the settings; it will restart the timer thread in line with start unless running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

reason for Python implementation to not use a background thread to replenish tokens?

wondering if due to GIL where the thread might not actually run as consistently on interval as we'd like in a CPU bound application, thus not replenishing tokens accurately.

Co-authored-by: Lin Lin <lin.lin@solarwinds.com>
if settings[:interval]
@interval = settings[:interval].clamp(0, MAX_INTERVAL)
SolarWindsAPM.logger.debug { "[#{self.class}/#{__method__}] Updated interval: #{@interval}ms" }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is hard to follow due to many threads and methods involved, but IIUC the settings are being updated in a single background thread that runs the update_settings logic, which uses a mutex to protect the @settings data. But not seeing this mutex used to synchronize the reading of @settings which seems to happen in get_settings. Seems we're missing the case of using the settings mutex to synchronize reads, e.g. multiple application threads could be calling shouldSample (thus getting settings), while this settings update thread is trying to write it?

Then as part of update_settings it calls this update_from_hash method, and we're using another mutex here to protect concurrent access of the @rate, @capacity and @tokens variables during token consume/replenish, right?

I'm also still confused why we could be getting settings[:interval] and needing to update it here.

Finally, i think the APM Python way is easier on the brain. One less thread to think about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we're missing the case of using the settings mutex to synchronize reads

Yes, I will add them later

I'm also still confused why we could be getting settings[:interval] and needing to update it here.

Yeah, the interval part should be removed

APM Python way is easier on the brain.

Agree, especially with the thread, the mutex will make everything complex.

@xuan-cao-swi xuan-cao-swi requested a review from cheempz December 5, 2025 16:34
@xuan-cao-swi xuan-cao-swi changed the base branch from main to 7-0-patches December 5, 2025 19:03
Copy link
Contributor

@cheempz cheempz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xuan-cao-swi. As discussed we can get this in, then actually revamp the thread safety logic. Minor is we should also get the sample_rate deprecation into main if not already there.

@xuan-cao-swi xuan-cao-swi changed the title [DO NOT MERGE] NH-124861: restart timer thread NH-124861: restart timer thread Dec 5, 2025
@xuan-cao-swi xuan-cao-swi merged commit 5ff759a into 7-0-patches Dec 8, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants