Skip to content

Conversation

@lithomas1
Copy link
Collaborator

@lithomas1 lithomas1 commented May 10, 2025

Changes

  • Fixed association of kernel launches to kernels
  • New time breakdown function breaking down time spent in each annotation
  • Upgrade flat_profile to allow showing breakdowns by different parallelism levels.

@lithomas1 lithomas1 marked this pull request as ready for review May 24, 2025 16:35
@lithomas1 lithomas1 requested a review from jhdavis8 May 24, 2025 16:35
@jhdavis8 jhdavis8 requested a review from Copilot May 28, 2025 17:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces enhancements to profiling and annotation APIs, including improved kernel-event association, a new breakdown function for annotation timings, and extended flat profiling with custom grouping and parallelism controls.

  • Expanded flat_profile method with mapper, parallelism_level, ascending, and idle_time options
  • Added time_breakdown to compute CPU and launched-kernel time per annotation
  • Updated SQLite reader to disable unused CUPTI events and switch from _children to _kernel_launch

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
pipit/trace.py Enhanced flat_profile, added time_breakdown, removed dropna=False, and debug print
pipit/readers/nsight_sqlite_reader.py Commented out MEMCPY/MEMSET/SYNCH SQL, and replaced _children assignments with _kernel_launch
Comments suppressed due to low confidence (4)

pipit/readers/nsight_sqlite_reader.py:293

  • Swapping _children for _kernel_launch breaks consumers (e.g., time_breakdown and filter_by_label) that expect _children. Ensure both fields are populated or update all references.
trace_df.loc[calls_that_launch["index_x"].to_numpy(), "_kernel_launch"] = (calls_that_launch["index_y"].to_numpy())

pipit/trace.py:1040

  • [nitpick] Using DataFrame.apply with Python loops for each kernel may be slow on large traces. Consider vectorized operations or grouping strategies to improve performance.
kernels.apply(_calc_kernel_time, axis=1,)

pipit/readers/nsight_sqlite_reader.py:71

  • [nitpick] Large commented-out SQL queries add clutter. If these events are unused long-term, remove the dead code or move it to documentation.
#     """ ... large commented SQL block ... """

pipit/trace.py:528

  • New parameters and logic in flat_profile and time_breakdown lack dedicated unit tests. Add tests covering custom mapper, parallelism levels, idle_time, sorting, and the new breakdown method.
def flat_profile(..., mapper=None, parallelism_level=None, ascending=None, idle_time=False):

pd_grouper[label] = group
res = (
res.set_index("Name")
.groupby([pd_grouper] + parallelism_level)[["time.exc"]]
Copy link

Copilot AI May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapper branch hard-codes time.exc instead of using the metrics parameter; this will ignore any other metrics list provided. Use the metrics variable here.

Suggested change
.groupby([pd_grouper] + parallelism_level)[["time.exc"]]
.groupby([pd_grouper] + parallelism_level)[metrics]

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll agree with this, worth making sure this function works when other metrics than time.exc are passed in.

Copy link

@jhdavis8 jhdavis8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features and documentation looks good overall. I added a few comments.

One high-level question, are there new unit tests checking the added features? Would be good to add these here or in a later PR.

pd_grouper[label] = group
res = (
res.set_index("Name")
.groupby([pd_grouper] + parallelism_level)[["time.exc"]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll agree with this, worth making sure this function works when other metrics than time.exc are passed in.


# Use explode to expand every child in children list to a row
# This can include duplicates (e.g. for nested annotations) that we should drop
kernels = events.loc[host_events["_children"].dropna().explode().to_numpy()]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_children referred to here

@jhdavis8 jhdavis8 self-requested a review September 22, 2025 19:40
# Postprocessing using mapper
if mapper is not None:
# pandas expects label->group
labels = res["Name"]
Copy link
Contributor

@ocnkr ocnkr Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be groupby_column? I'm not sure but wanted to double-check. I think we should check whether we want to use the hardcoded "Name" column or groupby_column in every line we use "Name".

@ocnkr
Copy link
Contributor

ocnkr commented Oct 12, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants