Skip to content

Conversation

@marcotc
Copy link
Member

@marcotc marcotc commented Nov 21, 2025

(I'll squash the commits on reviews finish)

What does this PR do?

This PR ensure errors raised from native ext/ code have enough context to be valuable in telemetry.

Motivation:

Because error information sent to telemetry cannot have arbitrary, dynamic data, we must ensure that we are only sending values that are known at gem build time.

Because our fallback is to only report the exception class, this PR adds a telemetry_message to native exceptions, so that their error information is not completely lost.

Change log entry

Yes. Telemetry: Added static error reporting for native extensions.

Additional Notes:

There's a follow up PR to reduce code duplication for the datadog_ruby_common.c/h files: #5088

How to test the change?
Easiest and fastest test it: bundle exec rake clean compile && bundle exec rake spec:profiling
There are more products supported by libdatadog, but at that point, just run CI :)

bouwkast and others added 26 commits November 7, 2025 11:40
This gets us closer to allowing these errors to be
sent to telemetry.
Add ruby_helpers.h include to 8 C files that use datadog_profiling_error_class
and datadog_profiling_internal_error_class but were missing the header declaration.

This fixes the compilation error:
  error: 'datadog_profiling_error_class' undeclared

Files fixed:
- clock_id_from_pthread.c
- collectors_gc_profiling_helper.c
- collectors_stack.c
- collectors_thread_context.c
- encoded_profile.c
- libdatadog_helpers.c
- private_vm_api_access.c
- unsafe_api_calls_check.c
Move ruby_helpers.h include after private VM headers to avoid conflicts.
This file requires private VM headers to be included first before any
public Ruby headers, but ruby_helpers.h includes datadog_ruby_common.h
which includes ruby.h, causing header ordering conflicts.

Fixes compilation error: 'expected ')' before '==' token in RHASH_EMPTY_P'
Cannot include ruby_helpers.h in this file as it pulls in public Ruby headers
(via datadog_ruby_common.h) that conflict with private VM headers.

Instead, declare the exception class globals as extern, following the pattern
already established in this file for other declarations.

This fully resolves the header ordering compilation error.
Method was renamed from safe_exception_message to constant_exception_message
but the RBS signature file was not updated, causing Steep type errors.
The error method must be public but was accidentally made private when
constant_exception_message was added. Moving it before the private keyword
restores its public visibility.

Fixes test failure: NoMethodError: private method 'error' called
Serialization errors contain dynamic libdatadog content, so they should
raise ProfilingInternalError (not ProfilingError or RuntimeError).

Updated both the Ruby wrapper code and the test expectation to use
ProfilingInternalError consistently.

Fixes test failure expecting ProfilingError but getting RuntimeError.
Signed-off-by: Marco Costa <marco.costa@datadoghq.com>
Signed-off-by: Marco Costa <marco.costa@datadoghq.com>
Signed-off-by: Marco Costa <marco.costa@datadoghq.com>
Signed-off-by: Marco Costa <marco.costa@datadoghq.com>
@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Nov 21, 2025
@pr-commenter
Copy link

pr-commenter bot commented Nov 21, 2025

Benchmarks

Benchmark execution time: 2026-01-07 00:31:35

Comparing candidate commit c3c9ef4 in PR branch marcotc/error-logs-remediation-custom-profiler-code with baseline commit ae7f953 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - stack collector (ruby frames - native filenames enabled)

  • 🟩 throughput [+168.830op/s; +170.917op/s] or [+5.329%; +5.394%]

@marcotc marcotc force-pushed the marcotc/error-logs-remediation-custom-profiler-code branch from b2d87f4 to 6a6bdfb Compare December 5, 2025 23:43
@marcotc marcotc marked this pull request as ready for review December 6, 2025 00:03
@marcotc marcotc requested review from a team as code owners December 6, 2025 00:03
Signed-off-by: Marco Costa <marco.costa@datadoghq.com>
@datadog-datadog-prod-us1

This comment has been minimized.

@marcotc marcotc self-assigned this Dec 8, 2025
@ivoanjo
Copy link
Member

ivoanjo commented Dec 11, 2025

The failure in the benchmarks...

+ bundle exec ruby benchmarks/profiling_sample_gvl.rb
Current pid is 3296
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
Warming up --------------------------------------
profiling - gvl benchmark samples
benchmarks/profiling_sample_gvl.rb:64:in `_native_sample_after_gvl_running': wrong number of arguments (given 3, expected 2) (ArgumentError)
        Datadog::Profiling::Collectors::ThreadContext::Testing._native_sample_after_gvl_running(@collector, @target_thread, false)
                                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	from benchmarks/profiling_sample_gvl.rb:64:in `block (2 levels) in run_benchmark'
	from (eval at /usr/local/bundle/gems/benchmark-ips-2.14.0/lib/benchmark/ips/job/entry.rb:63):6:in `call_times'
	from /usr/local/bundle/gems/benchmark-ips-2.14.0/lib/benchmark/ips/job.rb:285:in `block in run_warmup'
	from /usr/local/bundle/gems/benchmark-ips-2.14.0/lib/benchmark/ips/job.rb:268:in `each'
	from /usr/local/bundle/gems/benchmark-ips-2.14.0/lib/benchmark/ips/job.rb:268:in `run_warmup'
	from /usr/local/bundle/gems/benchmark-ips-2.14.0/lib/benchmark/ips/job.rb:253:in `block in run'

is because the "use the test code from this branch with master" doesn't work if master isn't API-compatible with this branch.

TBH what I've done in the past is just disable the benchmark and re-enable it later, but it's kinda meh workaround. Suggestions welcome 😅

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! The one reason I didn't press the approve button is due to the raise_error in datadog_ruby_common.c -- that one doesn't quite look correct.

Comment on lines 93 to 96
void datadog_ruby_common_init(VALUE datadog_module) {
// No longer needed - using Ruby's built-in exception classes
(void)datadog_module;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest removing this + its callsites; it's a bit of complexity (including comments in "profiling.c" saying this needs to be called first) and I'm not sure it's worth leaving around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should actually be initializing the ID telemetry_message_id.
The reason it's not is because I inadvertently reverted some of the usages of raise_error in the libdatadog folder.

Now that I added them back, we need to keep these init methods.

Comment on lines 24 to 30
void raise_error(VALUE error_class, const char *fmt, ...) {
va_list args;
va_start(args, fmt);
VALUE message = rb_vsprintf(fmt, args);
va_end(args);
rb_raise(error_class, "%"PRIsVALUE, message);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm confused by this... Is this a leftover from an earlier version?

That is, I thought the correct version of raise_error was the #define raise_error we have in ruby_helpers.c (the one that grabs the format string separately, etc); this one seems to just be a weird direct passthrough to rb_raise.

Am I missing something? 🤔 👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!
This is left over from before.

I delete it and fixed the references, which created a small cascade of method movements.

Comment on lines 63 to 99
it "is an instance of RuntimeError" do
expect { raise_native_runtime_error }.to raise_exception(RuntimeError)
end
end

context "when raising ArgumentError" do
subject(:raise_native_argument_error) do
described_class::Testing._native_grab_gvl_and_raise(::ArgumentError, "argument error test", nil, true)
end

it "raises an ArgumentError" do
expect { raise_native_argument_error }.to raise_error(::ArgumentError) do |error|
expect(error.message).to eq("argument error test")
expect(error.instance_variable_get(:@telemetry_message)).to eq("argument error test")
end
end

it "is an instance of ArgumentError" do
expect { raise_native_argument_error }.to raise_exception(ArgumentError)
end
end

context "when raising TypeError" do
subject(:raise_native_type_error) do
described_class::Testing._native_grab_gvl_and_raise(::TypeError, "type error test", nil, true)
end

it "raises a TypeError" do
expect { raise_native_type_error }.to raise_error(::TypeError) do |error|
expect(error.message).to eq("type error test")
expect(error.instance_variable_get(:@telemetry_message)).to eq("type error test")
end
end

it "is an instance of TypeError" do
expect { raise_native_type_error }.to raise_exception(TypeError)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The "is an instance of" tests can be removed, since the raise_error(...) validations in the "it raises ..." testcases already include that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! They are totally redundant!

Comment on lines 78 to 85
#define ENFORCE_SUCCESS_HELPER(expression, have_gvl) \
{ int result_syserr_errno = expression; if (RB_UNLIKELY(result_syserr_errno)) raise_syserr(result_syserr_errno, have_gvl, ADD_QUOTES(expression), __FILE__, __LINE__, __func__); }
{ int result_syserr_errno = expression; if (RB_UNLIKELY(result_syserr_errno)) raise_enforce_syserr(result_syserr_errno, have_gvl, ADD_QUOTES(expression), __FILE__, __LINE__, __func__); }

#define RUBY_NUM_OR_NIL(val, condition, conv) ((val condition) ? conv(val) : Qnil)
#define RUBY_AVG_OR_NIL(total, count) ((count == 0) ? Qnil : DBL2NUM(((double) total) / count))

// Called by ENFORCE_SUCCESS_HELPER; should not be used directly
NORETURN(void raise_syserr(
NORETURN(void raise_enforce_syserr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: To match the private naming we used above, maybe these should become private_raise_syserr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name private_raise_syserr already exists 😅, but I renamed it to private_raise_enforce_syserr.

@marcotc marcotc requested a review from ivoanjo December 24, 2025 02:02
@ivoanjo
Copy link
Member

ivoanjo commented Jan 7, 2026

@marcotc marcotc requested a review from ivoanjo 2 weeks ago

Btw this one is still on my radar, catching up on a pile of things this week! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants