Skip to content

Conversation

@Doohl
Copy link

@Doohl Doohl commented Oct 2, 2025

fixes #1143

What is this change?

This PR adds instrumentation to track the initial draw time of Android activities. A new FirstDraw span is created as a child of the activity creation span (e.g., AppStart or Created), measuring the time from activity creation until the first frame is rendered on screen.

The implementation:

  • Hooks into Android's ViewTreeObserver.OnDrawListener to detect when the first draw occurs
  • Handles Android API < 26 bug where draw listeners weren't properly merged into the view tree observer
  • Captures screen complexity metrics (view node count and depth) when the draw completes

What are the points of contention?

1. Screen view complexity attributes

Two new attributes are added to the FirstDraw span to capture screen complexity:

  • screen.view.nodes (long): Total count of all View nodes in the view hierarchy
  • screen.view.depth (long): Maximum depth of the view hierarchy

We believe these metrics will add some value if users try to debug long FirstDraw spans. Feedback here would be appreciated, though!

2. FirstDraw value

FirstDraw spans aren't a perfect signal for when Activities become 'interactable'. Rather, this can help users identify issues with the UI pipeline. For that, users would have to manually instrument spans or events that fire off when the app or activity enters a state where the end user can actually start interacting with the app.

3. FirstDraw span lifecycle

FirstDraw spans are children of the Created spans. However, FirstDraw span can end long after the parent span has ended. This can create a confusing tracing experience, but we probably don't want to change the definition of the existing Created span.

Does it make sense to keep FirstDraw as a child of Created?

How was this tested?

Tested some scenarios with the demo app (API versions 28 and 30) to validate the correctness of the instrumentation. If you have any ideas how this could be better tested, feel free to opine!

image image

Raw trace: https://www.codebin.cc/code/cmg9yjmxb0001jz037mtuj2v0:HXhjtG6XV3Vhfh8Ewy6r1S6KfapJtmgM1S1beucxBmpG

@Doohl Doohl requested a review from a team as a code owner October 2, 2025 21:59
@codecov
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 63.15789% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.26%. Comparing base (f466e65) to head (5360ece).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...instrumentation/activity/draw/FirstDrawListener.kt 40.90% 13 Missing ⚠️
...droid/instrumentation/activity/ActivityTracer.java 38.46% 8 Missing ⚠️
...droid/instrumentation/activity/draw/WindowUtils.kt 80.95% 3 Missing and 1 partial ⚠️
...strumentation/activity/Pre29ActivityCallbacks.java 33.33% 2 Missing ⚠️
...roid/instrumentation/activity/ActivityCallbacks.kt 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1281      +/-   ##
==========================================
- Coverage   64.30%   64.26%   -0.04%     
==========================================
  Files         142      145       +3     
  Lines        3012     3087      +75     
  Branches      296      307      +11     
==========================================
+ Hits         1937     1984      +47     
- Misses        998     1025      +27     
- Partials       77       78       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@LikeTheSalad LikeTheSalad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. These changes seem to be related to this semconv PR where I've posted some concerns about things such as the type of signal to use for these kinds of events, as well as the utility of certain attributes such as nodes and depth followed by a lot of questions on what to do with them around certain scenarios. I'd like to get to a consensus over there first before moving forward with code changes.

@bidetofevil
Copy link
Contributor

bidetofevil commented Oct 21, 2025

The problem with detecting the first frame to be drawn is that, as you mentioned, it doesn't mean the Activity is ready to be consumed, it also doesn't necessarily mean the view tree is complete. If we are dealing with a Compose-based Activity, a minimal view tree will first be drawn and the rest will be filled in after the first frame is delivered.

This means that the page complexity attribute being logged may be inaccurate - or at least not represent what you think it does. (I think there are further issues with using that metric in the first place, but one thing at a time 😅 )

I think tracking this event offers good utility, but we should probably properly contextualize it, and perhaps not add in extra data that may be noisy or misleading.

@marandaneto
Copy link
Member

i think the user has to tell when the activity is ready to be used (vs fully drawn etc)
maybe we could inspire or use https://developer.android.com/reference/kotlin/androidx/activity/FullyDrawnReporter#reportFullyDrawn()

@marandaneto
Copy link
Member

https://github.com/square/papa/tree/main also has a few things related to what we do here
but its more about the app's launch and not per activity i think (worth checking)

@Doohl
Copy link
Author

Doohl commented Oct 21, 2025

The problem with detecting the first frame to be drawn is that, as you mentioned, it doesn't mean the Activity is ready to be consumed, it also doesn't necessarily mean the view tree is complete.

Agree, but that isn't really the purpose of tracking this event. As it stands the best way to track when an Activity is ready to interact / consume (ie a "Time to interactive") would be to track the Activity reportFullyDrawn. That is a separate thing altogether.

This PR is concerned with the TimeToInitialDisplay vital.

If we are dealing with a Compose-based Activity, a minimal view tree will first be drawn and the rest will be filled in after the first frame is delivered.

Yeah InitialDraw doesn't work too well in the context of Compose Activities. Open to ideas.

This means that the page complexity attribute being logged may be inaccurate - or at least not represent what you think it does. (I think there are further issues with using that metric in the first place, but one thing at a time 😅 )

The complexity / depth attributes are removed! I think they aren't really needed for this new type of telemetry tbh

I think tracking this event offers good utility, but we should probably properly contextualize it, and perhaps not add in extra data that may be noisy or misleading.

Agreed. I've removed it.

i think the user has to tell when the activity is ready to be used (vs fully drawn etc)
maybe we could inspire or use

Yeah, the dev has to instrument something that can broadcast when an app / activity is ready to interact with. But we can automatically instrument, at least, when the Activity is ready to draw or has begun drawing.

@LikeTheSalad
Copy link
Contributor

LikeTheSalad commented Oct 22, 2025

Hi @Doohl

Again, thank you for taking the time to make this contribution and for your patience. I'm following up here based on this comment from the related semconv PR. I've covered a lot of details there that seem relevant to this work, and, in a nutshell, I'd like to try and see if this PR can help make progress in the semconv one. Please take a look at that comment for more details.

Before diving deep into the implementation details that you propose here, I'd like to make sure we're all on the same page in terms of what the expected outcome is that you'd like to achieve. I know that the idea behind the semconv PR is to define a platform-agnostic span, which I think would be great, but for practical purposes, I'd like to get a better understanding of what it means specifically for Android. So, I'd like to use a visualization that relies on Android-specific terms to see if that helps to get a better understanding overall:

Screenshot 2025-10-22 at 14 33 29

The image shows 3 possible scenarios (used to be 4, but then I realized that 3 is enough) that can be covered with a span. Which scenario do you think better covers the outcome that you'd expect from this implementation?

[DecorView](https://developer.android.com/reference/android/view/Window#getDecorView())
* Attributes:
* `activity.name`: name of activity
* `screen.name`: name of screen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: We've recently aligned on using app.screen.name for the attribute key, could we update to use that? Related PR: https://github.com/open-telemetry/semantic-conventions/pull/2744/files

### First Draw

* Type: Span
* Name: `FirstDraw`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I'm currently proposing app.screen.time_to_first_draw in open-telemetry/semantic-conventions#2831. PR is still open so it may change but I don't think we should be using Pascal case regardless. I know that existing spans e.g. AppStart are in Pascal case but I recall both Jason and Hanson saying that this is something that needs to be fixed in the OTel Android SDK since it doesn't align with the usual casing convention of OTel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

First draw instrumentation

5 participants