Skip to content

Commit bad04e6

Browse files
committed
Enhance configuration and caching mechanisms, add clear_cache option, and improve binary file handling
1 parent 9272d2e commit bad04e6

25 files changed

+3810
-328
lines changed

.github/workflows/ci.yml

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ name: CI
22

33
on:
44
push:
5-
branches: [ master ]
5+
branches: [master]
66
pull_request:
7-
branches: [ master ]
7+
branches: [master]
88

99
env:
1010
CARGO_TERM_COLOR: always
@@ -15,7 +15,7 @@ jobs:
1515
runs-on: ${{ matrix.os }}
1616
strategy:
1717
matrix:
18-
os: [ ubuntu-latest, windows-latest, macos-latest ]
18+
os: [ubuntu-latest, windows-latest, macos-latest]
1919
steps:
2020
- name: Checkout code
2121
uses: actions/checkout@v5
@@ -43,20 +43,14 @@ jobs:
4343
- name: Build (default features)
4444
run: cargo build --verbose
4545

46-
- name: Build (no default features)
47-
run: cargo build --no-default-features --verbose
48-
4946
- name: Build (all features)
5047
run: cargo build --all-features --verbose
5148

5249
- name: Run tests
53-
run: cargo test --verbose
54-
55-
- name: Run tests (no default features)
56-
run: cargo test --no-default-features --verbose
50+
run: cargo test --verbose -- --test-threads=1
5751

5852
- name: Run tests (all features)
59-
run: cargo test --all-features --verbose
53+
run: cargo test --all-features --verbose -- --test-threads=1
6054

6155
security:
6256
name: Security Audit

.gitignore

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,8 @@ Cargo.lock
1212
.DS_Store
1313
Thumbs.db
1414

15-
# Output files
16-
*.md
17-
!README.md
18-
!CHANGELOG.md
19-
!docs/**/*.md
20-
2115
# Logs
2216
*.log
2317

2418
samples/
19+
docs/*.md

CHANGELOG.md

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ All notable changes to this project will be documented in this file.
55
## v0.5.0
66

77
- **BREAKING CHANGES**
8-
- Environment variable `CB_DIFF_CONTEXT_LINES` is no longer used; diff configuration is now handled explicitly through `DiffConfig`
98
- Cache file locations changed to project-specific paths to prevent collisions
109

1110
- **Critical Bug Fixes**
@@ -39,26 +38,34 @@ All notable changes to this project will be documented in this file.
3938
- Fixed inconsistent file tree visualization between auto-diff and standard modes
4039

4140
- **Testing & Quality**
42-
- Added comprehensive integration test suite with 11 tests covering:
43-
- Determinism verification (5 tests)
44-
- Auto-diff workflows (6 tests)
41+
- Added comprehensive integration test suite with tests covering:
42+
- Determinism verification
43+
- Auto-diff workflows
4544
- Cache collision prevention
4645
- Configuration change detection
4746
- Error recovery scenarios
48-
- All tests use `#[serial]` attribute to prevent race conditions
47+
- Fixed test race conditions by running tests serially in CI (`--test-threads=1`)
4948
- Added `pretty_assertions` for better test output
49+
- Fixed all clippy warnings and enforced `-D warnings` in CI
5050

5151
- **Dependencies**
52-
- Added `fs2 = "0.4.3"` for file locking
53-
- Added `serde_json = "1.0"` for structured cache format
54-
- Added `serial_test = "3.0"` for test serialization
55-
- Added `pretty_assertions = "1.4"` for enhanced test output
52+
- Added `fs2` for file locking
53+
- Added `serde_json` for structured cache format
54+
- Added `serial_test` for test serialization
55+
- Added `pretty_assertions` for enhanced test output
56+
- Added `encoding_rs` for enhanced encoding detection and transcoding
5657

5758
- **Migration**
5859
- Automatic detection and cleanup of old markdown-based cache files (`last_canonical.md`, etc.)
5960
- First run after upgrade will clear old cache format to prevent conflicts
6061
- CLI interface remains fully backward compatible
6162

63+
- **Code Quality & Maintenance**
64+
- Fixed all clippy warnings including type complexity, collapsible if statements, and redundant closures
65+
- Updated CI workflow to prevent race conditions in tests
66+
- Improved binary file detection with better encoding strategy handling
67+
- Enhanced error handling for edge cases and file system operations
68+
6269
## v0.4.0
6370

6471

Cargo.lock

Lines changed: 37 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,15 @@ env_logger = "0.11.8"
2121
rayon = { version = "1.10", optional = true }
2222
serde = { version = "1.0.219", features = ["derive"] }
2323
toml = "0.9.5"
24-
similar = "2.4.0"
24+
similar = "2.7.0"
2525
tempfile = "3.22.0"
2626
tiktoken-rs = "0.7.0"
27-
once_cell = "1.19.0"
27+
once_cell = "1.21.3"
2828
fs2 = "0.4.3"
2929
serde_json = "1.0.143"
30+
crossbeam-channel = "0.5.15"
31+
num_cpus = "1.17.0"
32+
encoding_rs = "0.8.35"
3033

3134
[features]
3235
default = ["parallel"]

README.md

Lines changed: 73 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,31 +19,31 @@ It's a command-line utility that recursively processes directories and creates c
1919
## Core Features
2020

2121

22-
-**Blazing Fast & Parallel by Default:**
22+
-**Blazing Fast & Parallel by Default:**
2323
Processes thousands of files in seconds by leveraging all available CPU cores.
2424

25-
- 🧠 **Smart & Efficient File Discovery:**
25+
- 🧠 **Smart & Efficient File Discovery:**
2626
Respects `.gitignore` and custom ignore patterns out-of-the-box using optimized, parallel directory traversal.
2727

28-
- 💾 **Memory-Efficient Streaming:**
28+
- 💾 **Memory-Efficient Streaming:**
2929
Handles massive files with ease by reading and writing line-by-line, keeping memory usage low.
3030

31-
- 🌳 **Clear File Tree Visualization:**
31+
- 🌳 **Clear File Tree Visualization:**
3232
Generates an easy-to-read directory structure at the top of the output file.
3333

34-
- 🔍 **Powerful Filtering & Preview:**
34+
- 🔍 **Powerful Filtering & Preview:**
3535
Easily include only the file extensions you need and use the instant `--preview` mode to see what will be processed.
3636

37-
- ⚙️ **Configuration-First:**
37+
- ⚙️ **Configuration-First:**
3838
Use a `.context-builder.toml` file to store your preferences for consistent, repeatable outputs.
3939

40-
- 🔁 **Automatic Per-File Diffs:**
40+
- 🔁 **Automatic Per-File Diffs:**
4141
When enabled, automatically generates a clean, noise-reduced diff showing what changed between snapshots.
4242

43-
- ✂️ **Diff-Only Mode:**
43+
- ✂️ **Diff-Only Mode:**
4444
Output only the change summary and modified file diffs—no full file bodies—to minimize token usage.
4545

46-
- 🧪 **Accurate Token Counting:**
46+
- 🧪 **Accurate Token Counting:**
4747
Get real tokenizer–based estimates with `--token-count` to plan your prompt budgets.
4848

4949

@@ -57,6 +57,41 @@ It's a command-line utility that recursively processes directories and creates c
5757
cargo install context-builder
5858
```
5959

60+
61+
### If you don't have Rust installed
62+
63+
Context Builder is distributed via crates.io. We do not ship pre-built binaries yet, so you need a Rust toolchain.
64+
65+
66+
#### Quick install (Linux/macOS):
67+
68+
```bash
69+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
70+
```
71+
Follow the prompt, then restart your shell
72+
73+
#### Windows (PowerShell):
74+
75+
```powershell
76+
irm https://sh.rustup.rs -UseBasicParsing | Invoke-Expression
77+
```
78+
79+
After installation, ensure Cargo is on your PATH:
80+
81+
```bash
82+
cargo --version
83+
```
84+
85+
Then install Context Builder:
86+
```bash
87+
cargo install context-builder
88+
```
89+
90+
Update later with:
91+
```bash
92+
cargo install context-builder --force
93+
```
94+
6095
### From source
6196

6297
```bash
@@ -100,6 +135,15 @@ context-builder --token-count
100135
# Add line numbers to all code blocks
101136
context-builder --line-numbers
102137

138+
# Skip all confirmation prompts (auto-answer yes)
139+
context-builder --yes
140+
141+
# Output only diffs (requires auto-diff & timestamped output)
142+
context-builder --diff-only
143+
144+
# Clear cached project state (resets auto-diff baseline & removes stored state)
145+
context-builder --clear-cache
146+
103147
# Combine multiple options for a powerful workflow
104148
context-builder -d ./src -f rs -f toml -i tests --line-numbers -o rust_context.md
105149
```
@@ -129,6 +173,9 @@ auto_diff = true
129173
# Set to true to greatly reduce token usage when you just need what's changed.
130174
diff_only = false
131175

176+
# Number of context lines to show around changes in diffs (default: 3)
177+
diff_context_lines = 5
178+
132179
# File extensions to include
133180
filter = ["rs", "toml", "md"]
134181

@@ -137,6 +184,19 @@ ignore = ["target", "node_modules", ".git"]
137184

138185
# Add line numbers to code blocks
139186
line_numbers = true
187+
188+
# Preview mode: only show file tree without generating output
189+
preview = false
190+
191+
# Token counting mode
192+
token_count = false
193+
194+
# Automatically answer yes to all prompts
195+
yes = false
196+
197+
# Encoding handling strategy for non-UTF-8 files
198+
# Options: "detect" (default), "strict", "skip"
199+
encoding_strategy = "detect"
140200
```
141201

142202
---
@@ -161,10 +221,11 @@ If you also set `diff_only = true` (or pass `--diff-only`), the full “## Files
161221
- `--preview` - Preview mode: only show the file tree, don't generate output.
162222
- `--token-count` - Token count mode: accurately count the total token count of the final document using a real tokenizer.
163223
- `--line-numbers` - Add line numbers to code blocks in the output.
164-
- `--diff-only` - With `--auto-diff` + `--timestamped-output`, output only change summary + modified file diffs (omit full file bodies).
224+
- `-y, --yes` - Automatically answer yes to all prompts (skip confirmation dialogs).
225+
- `--diff-only` - With auto-diff + timestamped output, output only change summary + modified file diffs (omit full file bodies).
226+
- `--clear-cache` - Remove stored state used for auto-diff; next run becomes a fresh baseline.
165227
- `-h, --help` - Show help information.
166228
- `-V, --version` - Show version information.
167-
168229
---
169230

170231
## Token Counting
@@ -189,4 +250,4 @@ See **[CHANGELOG.md](CHANGELOG.md)** for a complete history of releases and chan
189250

190251
## License
191252

192-
This project is licensed under the MIT License. See the **[LICENSE](LICENSE)** file for details.
253+
This project is licensed under the MIT License. See the **[LICENSE](LICENSE)** file for details.

benches/context_bench.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,8 @@ fn bench_scenario(c: &mut Criterion, spec: DatasetSpec, line_numbers: bool) {
208208
line_numbers,
209209
yes: true,
210210
diff_only: false,
211+
212+
clear_cache: false,
211213
};
212214

213215
let prompter = NoPrompt;
@@ -249,6 +251,8 @@ fn bench_scenario(c: &mut Criterion, spec: DatasetSpec, line_numbers: bool) {
249251
line_numbers: args.line_numbers,
250252
yes: true,
251253
diff_only: false,
254+
255+
clear_cache: false,
252256
},
253257
Config::default(),
254258
&prompter,

0 commit comments

Comments
 (0)