Skip to content

Optimization: Replace shift().fillna(0) with shift(fill_value=0) for cleaner and faster Series manipulation #14

@SaFE-APIOpt

Description

@SaFE-APIOpt

self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift().fillna(0).to_dict()

Hi, I’d like to suggest a small optimization to this line:
self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift().fillna(0).to_dict()
This can be rewritten as:
self.task_start_index_dict = self.examples_per_task_srs.cumsum().shift(fill_value=0).to_dict()
Using shift(fill_value=0) integrates the missing-value fill directly into the shift operation. This is faster and more memory-efficient because it avoids creating an intermediate Series with NaNs and then performing a second pass for fillna(). Instead, the value substitution happens during the shift process at the C level, resulting in cleaner, single-step logic.

The current form with .shift().fillna(0) introduces extra overhead by first generating a new Series containing NaNs, followed by a full scan to fill them. While functionally equivalent, this pattern is slightly slower and less expressive. Switching to shift(fill_value=0) provides both performance and readability benefits, particularly when used in time-critical or repeated data-processing steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions