perf: significant search and sort performance improvements by ParthJadhav · Pull Request #40 · ParthJadhav/Rust_Search

ParthJadhav · 2026-03-02T12:39:37Z

Summary

similarity_sort 9x faster via Schwartzian transform (precompute scores instead of recomputing per comparison)
Search 12% faster via crossbeam-channel, types pre-filter, AcceptAll matcher, zero-copy path conversion, and 2x thread count
~35x faster sort for large datasets (10K+ items) with conditional rayon parallelism
Removed num_cpus dependency, replaced with std::thread::available_parallelism()
Added crossbeam-channel and rayon dependencies

Benchmark results

Benchmark	Before	After	Speedup
similarity_sort (28 items)	34.7µs	3.79µs	9.1x
similarity_sort (1K items)	1.31ms	155µs	8.5x
similarity_sort (10K items, est.)	~17.5ms	~500µs	~35x
search (home dir)	1.290s	1.137s	12%
search with limit	1.309s	1.148s	12%

Full system benchmark (3.9M files, searching from /)

Scenario	Results	Time
ext-only (.rs)	4,056	14.4s
ext-only (.txt)	17,566	13.5s
ext+limit (.rs, 1000)	1,000	16ms
no filter (all files)	3,915,838	24.6s
sort (4,056 .rs files)	4,056	1.5ms

Key optimizations

Schwartzian transform for similarity_sort — precompute all Jaro-Winkler scores once, sort by float
crossbeam-channel replacing std::sync::mpsc
Types pre-filter + AcceptAll matcher skips redundant checks for ext-only searches
Zero-copy path conversion via into_path().into_os_string().into_string()
2x thread count for better I/O overlap during directory traversal
Conditional rayon parallelism for scoring >5K items

Test plan

All 41 existing tests pass (unit, integration, doc tests)
Benchmarks run successfully on home dir, controlled dir, and full system
No behavioral changes to public API

- Skip regex entirely when only file extension is specified (no search input, not strict, not ignore_case) — uses direct OsStr comparison - Replace path.display().to_string() with to_string_lossy().into_owned() for cheaper path conversion - Improve benchmark harness with warmup iterations, median timing, and an "all" mode for running all benchmarks together - Add dirs as dev-dependency for benchmarks

- Replace std::sync::mpsc with crossbeam-channel for faster multi-producer single-consumer communication - Pre-filter files by extension using ignore crate's TypesBuilder, reducing callback invocations for non-matching files - Increase thread count to 2x CPU cores for better I/O overlap during directory traversal - Skip filter_entry closure when no filters are configured - Add controlled benchmark suite with 10,000-file test directory for reliable matching-path measurement

Use a precomputed-scores approach instead of recomputing file name extraction, lowercasing, and Jaro-Winkler similarity on every comparison. Before: O(n log n) comparisons each computing 2 scores = redundant work After: O(n) score computations + O(n log n) float comparisons Also: - Return &str instead of String from file_name_from_path to avoid alloc - Use sort_unstable_by for better cache locality on float comparisons - Apply in-place permutation to reorder results without extra allocation Benchmark results (1000 items): 1.313ms -> 155µs (8.4x faster)

Remove the num_cpus dependency in favor of the standard library's available_parallelism() (stable since Rust 1.59). This reduces the dependency count and uses the platform-native CPU detection.

Use rayon's par_iter for computing Jaro-Winkler scores in parallel when the dataset exceeds 5,000 items. Below the threshold, use sequential iteration to avoid rayon thread pool overhead. Also scale up controlled benchmark to 100,000 files for better stress testing of matching and sorting paths.

…arch - Add AcceptAll matcher variant: when the types pre-filter handles extension matching, skip redundant per-entry extension checks - Use entry.into_path().into_os_string().into_string() for zero-copy String conversion when paths are valid UTF-8 (99.9% of cases) - Remove add_defaults() from TypesBuilder to avoid loading hundreds of predefined type definitions on every search

Document all optimization checkpoints, what worked (Schwartzian transform 9x sort speedup, crossbeam-channel, zero-copy paths), what didn't work (rayon overhead for small datasets), and benchmark results for each iteration.

Add "system" mode that benchmarks searching from the root filesystem, covering ext-only, regex, limit, no-filter, hidden, strict, and case-insensitive search patterns across ~3.9M real files. Refactor benchmark helpers to reduce duplication.

- Rename `matched` to `is_match` to avoid similar_names lint - Use `is_some_and` instead of `map_or(false, ...)` for Option check - Remove needless borrows in test files - Remove unused `Path` import in bench

ParthJadhav added 9 commits March 2, 2026 01:58

perf: replace num_cpus with std::thread::available_parallelism

356caee

Remove the num_cpus dependency in favor of the standard library's available_parallelism() (stable since Rust 1.59). This reduces the dependency count and uses the platform-native CPU detection.

fix: resolve clippy warnings and formatting issues

5639adc

- Rename `matched` to `is_match` to avoid similar_names lint - Use `is_some_and` instead of `map_or(false, ...)` for Option check - Remove needless borrows in test files - Remove unused `Path` import in bench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: significant search and sort performance improvements#40

perf: significant search and sort performance improvements#40
ParthJadhav wants to merge 9 commits intomasterfrom
perf/optimize-search-and-sort

ParthJadhav commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ParthJadhav commented Mar 2, 2026

Summary

Benchmark results

Full system benchmark (3.9M files, searching from /)

Key optimizations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant