Skip to content

Performance penalty of probing ignore files under every subdirectory (especially on Windows), and a possible solution #3293

@AlanIWBFT

Description

@AlanIWBFT

On Windows, both fs::metadata() and Path::exists() go to std::sys::fs::windows::File::open and eventually CreateFileW which is very heavy:

Image

While GetFileAttributesW is slightly lighter, there is actually a better alternative. Windows prefers fetching metadata of directories in batches using FindFirstFileExW/FindNextFileW, which maps to fs::read_dir. So, if we prefetch the content of the entire directory using read_dir, use it to check the existence of ignore files, and visit the entries later, it can be potentially much faster as we avoid touching files individually. And it turned out true. I prototyped one here: AlanIWBFT@4e8fb87. The effectiveness of this optimization depends on the ratio of files/directories. More dirs = faster.

One thing worth mentioning is only the multi-thread path is touched in the prototype as it is already using read_dir.

Benchmark results

I tested it on three large repos: Unreal Engine 5.7.3, LLVM 22.1.0 and Chromium 147.0.7727.1. Patterns:

Unreal Engine 5.7.3: FLightmapRenderer::FLightmapRenderer
LLVM 22.1.0: opt<CompileArgsFrom> CompileArgsFrom
Chromium 147.0.7727.1: #define GL_CHROMIUM_pixel_transfer_buffer_object 1

These patterns are picked so that exactly 1 match will be found across the repository, avoiding noise from outputting to terminal.

5 warmup runs and 10 benchmark runs.

Windows (Windows 11 10.0.26200.7922)

Unreal Engine 5.7.3:

Command Mean [s] Min [s] Max [s] Relative
batch-fetch 2.418 ± 0.017 2.386 2.436 1.00
original 3.644 ± 0.026 3.606 3.683 1.51 ± 0.02

LLVM 22.1.0:

Command Mean [s] Min [s] Max [s] Relative
batch-fetch 1.967 ± 0.052 1.894 2.046 1.00
original 2.396 ± 0.065 2.320 2.543 1.22 ± 0.05

Chromium 147.0.7727.1:

Command Mean [s] Min [s] Max [s] Relative
batch-fetch 6.103 ± 0.073 6.025 6.228 1.00
original 7.448 ± 0.094 7.281 7.547 1.22 ± 0.02

Linux (Arch Linux 6.19.6-arch1-1)

On Linux fs operations are much faster. Will we fetch excessive data? Surprisingly it turned out also a win, though with much smaller margins.

Unreal Engine 5.7.3:

Command Mean [ms] Min [ms] Max [ms] Relative
batch-fetch 250.6 ± 4.0 242.3 255.6 1.00
original 277.0 ± 1.8 274.8 280.4 1.11 ± 0.02

LLVM 22.1.0:

Command Mean [ms] Min [ms] Max [ms] Relative
batch-fetch 176.9 ± 3.1 172.1 182.3 1.00
original 185.9 ± 2.5 181.8 189.7 1.05 ± 0.02

Chromium 147.0.7727.1:

Command Mean [ms] Min [ms] Max [ms] Relative
batch-fetch 489.4 ± 3.9 483.1 493.6 1.00
original 520.4 ± 5.5 509.5 525.4 1.06 ± 0.01

I don't own a Mac so cannot test there.

Hardware specs

CPU: AMD Ryzen Threadripper 3990X 64-Core (128) @ 4.35 GHz
Memory: 128 GiB DDR4 3200
SSD: Intel Optane 905P 960GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions