On Windows, both fs::metadata() and Path::exists() go to std::sys::fs::windows::File::open and eventually CreateFileW which is very heavy:
While GetFileAttributesW is slightly lighter, there is actually a better alternative. Windows prefers fetching metadata of directories in batches using FindFirstFileExW/FindNextFileW, which maps to fs::read_dir. So, if we prefetch the content of the entire directory using read_dir, use it to check the existence of ignore files, and visit the entries later, it can be potentially much faster as we avoid touching files individually. And it turned out true. I prototyped one here: AlanIWBFT@4e8fb87. The effectiveness of this optimization depends on the ratio of files/directories. More dirs = faster.
One thing worth mentioning is only the multi-thread path is touched in the prototype as it is already using read_dir.
Benchmark results
I tested it on three large repos: Unreal Engine 5.7.3, LLVM 22.1.0 and Chromium 147.0.7727.1. Patterns:
Unreal Engine 5.7.3: FLightmapRenderer::FLightmapRenderer
LLVM 22.1.0: opt<CompileArgsFrom> CompileArgsFrom
Chromium 147.0.7727.1: #define GL_CHROMIUM_pixel_transfer_buffer_object 1
These patterns are picked so that exactly 1 match will be found across the repository, avoiding noise from outputting to terminal.
5 warmup runs and 10 benchmark runs.
Windows (Windows 11 10.0.26200.7922)
Unreal Engine 5.7.3:
| Command |
Mean [s] |
Min [s] |
Max [s] |
Relative |
| batch-fetch |
2.418 ± 0.017 |
2.386 |
2.436 |
1.00 |
| original |
3.644 ± 0.026 |
3.606 |
3.683 |
1.51 ± 0.02 |
LLVM 22.1.0:
| Command |
Mean [s] |
Min [s] |
Max [s] |
Relative |
| batch-fetch |
1.967 ± 0.052 |
1.894 |
2.046 |
1.00 |
| original |
2.396 ± 0.065 |
2.320 |
2.543 |
1.22 ± 0.05 |
Chromium 147.0.7727.1:
| Command |
Mean [s] |
Min [s] |
Max [s] |
Relative |
| batch-fetch |
6.103 ± 0.073 |
6.025 |
6.228 |
1.00 |
| original |
7.448 ± 0.094 |
7.281 |
7.547 |
1.22 ± 0.02 |
Linux (Arch Linux 6.19.6-arch1-1)
On Linux fs operations are much faster. Will we fetch excessive data? Surprisingly it turned out also a win, though with much smaller margins.
Unreal Engine 5.7.3:
| Command |
Mean [ms] |
Min [ms] |
Max [ms] |
Relative |
| batch-fetch |
250.6 ± 4.0 |
242.3 |
255.6 |
1.00 |
| original |
277.0 ± 1.8 |
274.8 |
280.4 |
1.11 ± 0.02 |
LLVM 22.1.0:
| Command |
Mean [ms] |
Min [ms] |
Max [ms] |
Relative |
| batch-fetch |
176.9 ± 3.1 |
172.1 |
182.3 |
1.00 |
| original |
185.9 ± 2.5 |
181.8 |
189.7 |
1.05 ± 0.02 |
Chromium 147.0.7727.1:
| Command |
Mean [ms] |
Min [ms] |
Max [ms] |
Relative |
| batch-fetch |
489.4 ± 3.9 |
483.1 |
493.6 |
1.00 |
| original |
520.4 ± 5.5 |
509.5 |
525.4 |
1.06 ± 0.01 |
I don't own a Mac so cannot test there.
Hardware specs
CPU: AMD Ryzen Threadripper 3990X 64-Core (128) @ 4.35 GHz
Memory: 128 GiB DDR4 3200
SSD: Intel Optane 905P 960GB
On Windows, both
fs::metadata()andPath::exists()go tostd::sys::fs::windows::File::openand eventuallyCreateFileWwhich is very heavy:While
GetFileAttributesWis slightly lighter, there is actually a better alternative. Windows prefers fetching metadata of directories in batches usingFindFirstFileExW/FindNextFileW, which maps tofs::read_dir. So, if we prefetch the content of the entire directory usingread_dir, use it to check the existence of ignore files, and visit the entries later, it can be potentially much faster as we avoid touching files individually. And it turned out true. I prototyped one here: AlanIWBFT@4e8fb87. The effectiveness of this optimization depends on the ratio of files/directories. More dirs = faster.One thing worth mentioning is only the multi-thread path is touched in the prototype as it is already using
read_dir.Benchmark results
I tested it on three large repos: Unreal Engine 5.7.3, LLVM 22.1.0 and Chromium 147.0.7727.1. Patterns:
Unreal Engine 5.7.3:
FLightmapRenderer::FLightmapRendererLLVM 22.1.0:
opt<CompileArgsFrom> CompileArgsFromChromium 147.0.7727.1:
#define GL_CHROMIUM_pixel_transfer_buffer_object 1These patterns are picked so that exactly 1 match will be found across the repository, avoiding noise from outputting to terminal.
5 warmup runs and 10 benchmark runs.
Windows (Windows 11 10.0.26200.7922)
Unreal Engine 5.7.3:
LLVM 22.1.0:
Chromium 147.0.7727.1:
Linux (Arch Linux 6.19.6-arch1-1)
On Linux fs operations are much faster. Will we fetch excessive data? Surprisingly it turned out also a win, though with much smaller margins.
Unreal Engine 5.7.3:
LLVM 22.1.0:
Chromium 147.0.7727.1:
I don't own a Mac so cannot test there.
Hardware specs
CPU: AMD Ryzen Threadripper 3990X 64-Core (128) @ 4.35 GHz
Memory: 128 GiB DDR4 3200
SSD: Intel Optane 905P 960GB