Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions skills/openclaw-native/large-file-interceptor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
name: large-file-interceptor
version: "1.0"
category: openclaw-native
description: Detects oversized files that would blow the context window, generates structural exploration summaries, and stores compact references — preventing a single paste from consuming the entire budget.
stateful: true
---

# Large File Interceptor

## What it does

A single large file paste can consume 60–80% of the context window, leaving no room for actual work. Large File Interceptor detects oversized files, generates a structural summary (schema, columns, imports, key definitions), stores the original externally, and replaces it with a compact reference card.

Inspired by [lossless-claw](https://github.com/Martian-Engineering/lossless-claw)'s large file interception layer, which automatically extracts files exceeding 25k tokens.

## When to invoke

- Before processing any file the agent reads or receives — check size first
- When context budget is running low and large files may be the cause
- After a paste or file read — retroactively scan for oversized content
- Periodically to audit what's consuming the most context budget

## How to use

```bash
python3 intercept.py --scan <path> # Scan a file or directory
python3 intercept.py --scan <path> --threshold 10000 # Custom token threshold
python3 intercept.py --summarize <file> # Generate structural summary for a file
python3 intercept.py --list # List all intercepted files
python3 intercept.py --restore <ref-id> # Retrieve original file content
python3 intercept.py --audit # Show context budget impact
python3 intercept.py --status # Last scan summary
python3 intercept.py --format json # Machine-readable output
```

## Structural exploration summaries

The interceptor generates different summaries based on file type:

| File type | Summary includes |
|---|---|
| JSON/YAML | Top-level schema, key types, array lengths, nested depth |
| CSV/TSV | Column names, row count, sample values, data types per column |
| Python/JS/TS | Imports, class definitions, function signatures, export list |
| Markdown | Heading structure, word count per section, link count |
| Log files | Time range, error count, unique error patterns, frequency |
| Binary/Other | File size, MIME type, magic bytes |

## Reference card format

When a file is intercepted, the original is stored in `~/.openclaw/lcm-files/` and replaced with:

```
[FILE REFERENCE: ref-001]
Original: /path/to/large-file.json
Size: 145,230 bytes (~36,307 tokens)
Type: JSON — API response payload

Structure:
- Root: object with 3 keys
- "data": array of 1,247 objects
- "metadata": object (pagination, timestamps)
- "errors": empty array

Key fields in data[]: id, name, email, created_at, status
Sample: {"id": 1, "name": "...", "status": "active"}

To retrieve full content: python3 intercept.py --restore ref-001
```

## Procedure

**Step 1 — Scan before processing**

```bash
python3 intercept.py --scan /path/to/file.json
```

If the file exceeds the token threshold (default: 25,000 tokens), it generates a structural summary and stores a reference.

**Step 2 — Audit context impact**

```bash
python3 intercept.py --audit
```

Shows all files in the current workspace ranked by token impact, with recommendations for which to intercept.

**Step 3 — Restore when needed**

```bash
python3 intercept.py --restore ref-001
```

Retrieves the original file content from storage for detailed inspection.

## State

Intercepted file registry and reference cards stored in `~/.openclaw/skill-state/large-file-interceptor/state.yaml`. Original files stored in `~/.openclaw/lcm-files/`.

Fields: `last_scan_at`, `intercepted_files`, `total_tokens_saved`, `scan_history`.

## Notes

- Never deletes or modifies original files — intercept creates a copy + reference
- Token threshold is configurable (default: 25,000 ~= 100KB of text)
- Reference cards are typically 200–400 tokens vs. 25,000+ for the original
- Supports recursive directory scanning with `--scan /path/to/dir`
34 changes: 34 additions & 0 deletions skills/openclaw-native/large-file-interceptor/STATE_SCHEMA.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
version: "1.0"
description: Registry of intercepted large files, reference cards, and token savings.
fields:
last_scan_at:
type: datetime
token_threshold:
type: integer
default: 25000
description: Files exceeding this token count are intercepted
intercepted_files:
type: list
description: All intercepted file references
items:
ref_id: { type: string, description: "Reference ID (e.g. ref-001)" }
original_path: { type: string }
stored_path: { type: string, description: "Path in ~/.openclaw/lcm-files/" }
file_type: { type: string, description: "Detected file type" }
original_tokens: { type: integer }
summary_tokens: { type: integer }
tokens_saved: { type: integer }
summary: { type: string, description: "Structural exploration summary" }
intercepted_at: { type: datetime }
total_tokens_saved:
type: integer
description: Cumulative tokens saved by interception
scan_history:
type: list
description: Rolling log of past scans (last 20)
items:
scanned_at: { type: datetime }
path_scanned: { type: string }
files_checked: { type: integer }
files_intercepted: { type: integer }
tokens_saved: { type: integer }
66 changes: 66 additions & 0 deletions skills/openclaw-native/large-file-interceptor/example-state.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Example runtime state for large-file-interceptor
last_scan_at: "2026-03-16T14:30:05.000000"
token_threshold: 25000
intercepted_files:
- ref_id: ref-001
original_path: "/Users/you/project/data/api-response.json"
stored_path: "/Users/you/.openclaw/lcm-files/ref-001_a3b2c1d4e5f6.json"
file_type: JSON
original_tokens: 36307
summary_tokens: 180
tokens_saved: 36127
summary: |
Root: object with 3 keys
"data": array of 1247 objects
"metadata": object (5 keys)
"errors": empty array
Item keys: id, name, email, created_at, status
intercepted_at: "2026-03-16T14:30:03.000000"
- ref_id: ref-002
original_path: "/Users/you/project/logs/server.log"
stored_path: "/Users/you/.openclaw/lcm-files/ref-002_f7e8d9c0b1a2.log"
file_type: Log
original_tokens: 52800
summary_tokens: 220
tokens_saved: 52580
summary: |
Total lines: 8450
Time range: 2026-03-15T00:00 → 2026-03-16T14:29
Errors: 23, Warnings: 87
Unique error patterns: 5
ConnectionError: host N.N.N.N port N
TimeoutError: request exceeded Nms
ValueError: invalid JSON at line N
intercepted_at: "2026-03-16T14:30:04.000000"
total_tokens_saved: 88707
scan_history:
- scanned_at: "2026-03-16T14:30:05.000000"
path_scanned: "/Users/you/project"
files_checked: 48
files_intercepted: 2
tokens_saved: 88707
# ── Walkthrough ──────────────────────────────────────────────────────────────
# python3 intercept.py --scan /Users/you/project
#
# Intercepted: api-response.json (36,307 tokens → 180 tokens)
# Reference card:
# [FILE REFERENCE: ref-001]
# Original: /Users/you/project/data/api-response.json
# Size: 145,230 bytes (~36,307 tokens)
# Type: JSON — API response payload
# ...
#
# Intercepted: server.log (52,800 tokens → 220 tokens)
# ...
#
# Scan Complete — 48 files checked, 2 intercepted, ~88,707 tokens saved
#
# python3 intercept.py --audit
#
# Context Budget Audit
# ──────────────────────────────────────────────
# Intercepted files: 2
# Original token cost: ~89,107
# Summary token cost: ~400
# Total tokens saved: ~88,707
# Compression ratio: 99%
Loading
Loading