Skip to content

perf: significant search and sort performance improvements#40

Open
ParthJadhav wants to merge 9 commits intomasterfrom
perf/optimize-search-and-sort
Open

perf: significant search and sort performance improvements#40
ParthJadhav wants to merge 9 commits intomasterfrom
perf/optimize-search-and-sort

Conversation

@ParthJadhav
Copy link
Owner

Summary

  • similarity_sort 9x faster via Schwartzian transform (precompute scores instead of recomputing per comparison)
  • Search 12% faster via crossbeam-channel, types pre-filter, AcceptAll matcher, zero-copy path conversion, and 2x thread count
  • ~35x faster sort for large datasets (10K+ items) with conditional rayon parallelism
  • Removed num_cpus dependency, replaced with std::thread::available_parallelism()
  • Added crossbeam-channel and rayon dependencies

Benchmark results

Benchmark Before After Speedup
similarity_sort (28 items) 34.7µs 3.79µs 9.1x
similarity_sort (1K items) 1.31ms 155µs 8.5x
similarity_sort (10K items, est.) ~17.5ms ~500µs ~35x
search (home dir) 1.290s 1.137s 12%
search with limit 1.309s 1.148s 12%

Full system benchmark (3.9M files, searching from /)

Scenario Results Time
ext-only (.rs) 4,056 14.4s
ext-only (.txt) 17,566 13.5s
ext+limit (.rs, 1000) 1,000 16ms
no filter (all files) 3,915,838 24.6s
sort (4,056 .rs files) 4,056 1.5ms

Key optimizations

  1. Schwartzian transform for similarity_sort — precompute all Jaro-Winkler scores once, sort by float
  2. crossbeam-channel replacing std::sync::mpsc
  3. Types pre-filter + AcceptAll matcher skips redundant checks for ext-only searches
  4. Zero-copy path conversion via into_path().into_os_string().into_string()
  5. 2x thread count for better I/O overlap during directory traversal
  6. Conditional rayon parallelism for scoring >5K items

Test plan

  • All 41 existing tests pass (unit, integration, doc tests)
  • Benchmarks run successfully on home dir, controlled dir, and full system
  • No behavioral changes to public API

- Skip regex entirely when only file extension is specified (no search
  input, not strict, not ignore_case) — uses direct OsStr comparison
- Replace path.display().to_string() with to_string_lossy().into_owned()
  for cheaper path conversion
- Improve benchmark harness with warmup iterations, median timing, and
  an "all" mode for running all benchmarks together
- Add dirs as dev-dependency for benchmarks
- Replace std::sync::mpsc with crossbeam-channel for faster multi-producer
  single-consumer communication
- Pre-filter files by extension using ignore crate's TypesBuilder, reducing
  callback invocations for non-matching files
- Increase thread count to 2x CPU cores for better I/O overlap during
  directory traversal
- Skip filter_entry closure when no filters are configured
- Add controlled benchmark suite with 10,000-file test directory for
  reliable matching-path measurement
Use a precomputed-scores approach instead of recomputing file name
extraction, lowercasing, and Jaro-Winkler similarity on every comparison.

Before: O(n log n) comparisons each computing 2 scores = redundant work
After: O(n) score computations + O(n log n) float comparisons

Also:
- Return &str instead of String from file_name_from_path to avoid alloc
- Use sort_unstable_by for better cache locality on float comparisons
- Apply in-place permutation to reorder results without extra allocation

Benchmark results (1000 items): 1.313ms -> 155µs (8.4x faster)
Remove the num_cpus dependency in favor of the standard library's
available_parallelism() (stable since Rust 1.59). This reduces the
dependency count and uses the platform-native CPU detection.
Use rayon's par_iter for computing Jaro-Winkler scores in parallel when
the dataset exceeds 5,000 items. Below the threshold, use sequential
iteration to avoid rayon thread pool overhead.

Also scale up controlled benchmark to 100,000 files for better stress
testing of matching and sorting paths.
…arch

- Add AcceptAll matcher variant: when the types pre-filter handles
  extension matching, skip redundant per-entry extension checks
- Use entry.into_path().into_os_string().into_string() for zero-copy
  String conversion when paths are valid UTF-8 (99.9% of cases)
- Remove add_defaults() from TypesBuilder to avoid loading hundreds of
  predefined type definitions on every search
Document all optimization checkpoints, what worked (Schwartzian transform
9x sort speedup, crossbeam-channel, zero-copy paths), what didn't work
(rayon overhead for small datasets), and benchmark results for each
iteration.
Add "system" mode that benchmarks searching from the root filesystem,
covering ext-only, regex, limit, no-filter, hidden, strict, and
case-insensitive search patterns across ~3.9M real files. Refactor
benchmark helpers to reduce duplication.
- Rename `matched` to `is_match` to avoid similar_names lint
- Use `is_some_and` instead of `map_or(false, ...)` for Option check
- Remove needless borrows in test files
- Remove unused `Path` import in bench
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant