ArchieIndian · ArchieIndian · Mar 16, 2026 · Mar 16, 2026
diff --git a/skills/openclaw-native/large-file-interceptor/SKILL.md b/skills/openclaw-native/large-file-interceptor/SKILL.md
@@ -0,0 +1,109 @@
+---
+name: large-file-interceptor
+version: "1.0"
+category: openclaw-native
+description: Detects oversized files that would blow the context window, generates structural exploration summaries, and stores compact references — preventing a single paste from consuming the entire budget.
+stateful: true
+---
+
+# Large File Interceptor
+
+## What it does
+
+A single large file paste can consume 60–80% of the context window, leaving no room for actual work. Large File Interceptor detects oversized files, generates a structural summary (schema, columns, imports, key definitions), stores the original externally, and replaces it with a compact reference card.
+
+Inspired by [lossless-claw](https://github.com/Martian-Engineering/lossless-claw)'s large file interception layer, which automatically extracts files exceeding 25k tokens.
+
+## When to invoke
+
+- Before processing any file the agent reads or receives — check size first
+- When context budget is running low and large files may be the cause
+- After a paste or file read — retroactively scan for oversized content
+- Periodically to audit what's consuming the most context budget
+
+## How to use
+
+```bash
+python3 intercept.py --scan <path>                # Scan a file or directory
+python3 intercept.py --scan <path> --threshold 10000  # Custom token threshold
+python3 intercept.py --summarize <file>           # Generate structural summary for a file
+python3 intercept.py --list                       # List all intercepted files
+python3 intercept.py --restore <ref-id>           # Retrieve original file content
+python3 intercept.py --audit                      # Show context budget impact
+python3 intercept.py --status                     # Last scan summary
+python3 intercept.py --format json                # Machine-readable output
+```
+
+## Structural exploration summaries
+
+The interceptor generates different summaries based on file type:
+
+| File type | Summary includes |
+|---|---|
+| JSON/YAML | Top-level schema, key types, array lengths, nested depth |
+| CSV/TSV | Column names, row count, sample values, data types per column |
+| Python/JS/TS | Imports, class definitions, function signatures, export list |
+| Markdown | Heading structure, word count per section, link count |
+| Log files | Time range, error count, unique error patterns, frequency |
+| Binary/Other | File size, MIME type, magic bytes |
+
+## Reference card format
+
+When a file is intercepted, the original is stored in `~/.openclaw/lcm-files/` and replaced with:
+
+```
+[FILE REFERENCE: ref-001]
+Original: /path/to/large-file.json
+Size: 145,230 bytes (~36,307 tokens)
+Type: JSON — API response payload
+
+Structure:
+  - Root: object with 3 keys
+  - "data": array of 1,247 objects
+  - "metadata": object (pagination, timestamps)
+  - "errors": empty array
+
+Key fields in data[]: id, name, email, created_at, status
+Sample: {"id": 1, "name": "...", "status": "active"}
+
+To retrieve full content: python3 intercept.py --restore ref-001
+```
+
+## Procedure
+
+**Step 1 — Scan before processing**
+
+```bash
+python3 intercept.py --scan /path/to/file.json
+```
+
+If the file exceeds the token threshold (default: 25,000 tokens), it generates a structural summary and stores a reference.
+
+**Step 2 — Audit context impact**
+
+```bash
+python3 intercept.py --audit
+```
+
+Shows all files in the current workspace ranked by token impact, with recommendations for which to intercept.
+
+**Step 3 — Restore when needed**
+
+```bash
+python3 intercept.py --restore ref-001
+```
+
+Retrieves the original file content from storage for detailed inspection.
+
+## State
+
+Intercepted file registry and reference cards stored in `~/.openclaw/skill-state/large-file-interceptor/state.yaml`. Original files stored in `~/.openclaw/lcm-files/`.
+
+Fields: `last_scan_at`, `intercepted_files`, `total_tokens_saved`, `scan_history`.
+
+## Notes
+
+- Never deletes or modifies original files — intercept creates a copy + reference
+- Token threshold is configurable (default: 25,000 ~= 100KB of text)
+- Reference cards are typically 200–400 tokens vs. 25,000+ for the original
+- Supports recursive directory scanning with `--scan /path/to/dir`
diff --git a/skills/openclaw-native/large-file-interceptor/STATE_SCHEMA.yaml b/skills/openclaw-native/large-file-interceptor/STATE_SCHEMA.yaml
@@ -0,0 +1,34 @@
+version: "1.0"
+description: Registry of intercepted large files, reference cards, and token savings.
+fields:
+  last_scan_at:
+    type: datetime
+  token_threshold:
+    type: integer
+    default: 25000
+    description: Files exceeding this token count are intercepted
+  intercepted_files:
+    type: list
+    description: All intercepted file references
+    items:
+      ref_id:         { type: string, description: "Reference ID (e.g. ref-001)" }
+      original_path:  { type: string }
+      stored_path:    { type: string, description: "Path in ~/.openclaw/lcm-files/" }
+      file_type:      { type: string, description: "Detected file type" }
+      original_tokens: { type: integer }
+      summary_tokens: { type: integer }
+      tokens_saved:   { type: integer }
+      summary:        { type: string, description: "Structural exploration summary" }
+      intercepted_at: { type: datetime }
+  total_tokens_saved:
+    type: integer
+    description: Cumulative tokens saved by interception
+  scan_history:
+    type: list
+    description: Rolling log of past scans (last 20)
+    items:
+      scanned_at:       { type: datetime }
+      path_scanned:     { type: string }
+      files_checked:    { type: integer }
+      files_intercepted: { type: integer }
+      tokens_saved:     { type: integer }
diff --git a/skills/openclaw-native/large-file-interceptor/example-state.yaml b/skills/openclaw-native/large-file-interceptor/example-state.yaml
@@ -0,0 +1,66 @@
+# Example runtime state for large-file-interceptor
+last_scan_at: "2026-03-16T14:30:05.000000"
+token_threshold: 25000
+intercepted_files:
+  - ref_id: ref-001
+    original_path: "/Users/you/project/data/api-response.json"
+    stored_path: "/Users/you/.openclaw/lcm-files/ref-001_a3b2c1d4e5f6.json"
+    file_type: JSON
+    original_tokens: 36307
+    summary_tokens: 180
+    tokens_saved: 36127
+    summary: |
+      Root: object with 3 keys
+        "data": array of 1247 objects
+        "metadata": object (5 keys)
+        "errors": empty array
+      Item keys: id, name, email, created_at, status
+    intercepted_at: "2026-03-16T14:30:03.000000"
+  - ref_id: ref-002
+    original_path: "/Users/you/project/logs/server.log"
+    stored_path: "/Users/you/.openclaw/lcm-files/ref-002_f7e8d9c0b1a2.log"
+    file_type: Log
+    original_tokens: 52800
+    summary_tokens: 220
+    tokens_saved: 52580
+    summary: |
+      Total lines: 8450
+      Time range: 2026-03-15T00:00 → 2026-03-16T14:29
+      Errors: 23, Warnings: 87
+      Unique error patterns: 5
+        ConnectionError: host N.N.N.N port N
+        TimeoutError: request exceeded Nms
+        ValueError: invalid JSON at line N
+    intercepted_at: "2026-03-16T14:30:04.000000"
+total_tokens_saved: 88707
+scan_history:
+  - scanned_at: "2026-03-16T14:30:05.000000"
+    path_scanned: "/Users/you/project"
+    files_checked: 48
+    files_intercepted: 2
+    tokens_saved: 88707
+# ── Walkthrough ──────────────────────────────────────────────────────────────
+# python3 intercept.py --scan /Users/you/project
+#
+#   Intercepted: api-response.json (36,307 tokens → 180 tokens)
+#   Reference card:
+#   [FILE REFERENCE: ref-001]
+#   Original: /Users/you/project/data/api-response.json
+#   Size: 145,230 bytes (~36,307 tokens)
+#   Type: JSON — API response payload
+#   ...
+#
+#   Intercepted: server.log (52,800 tokens → 220 tokens)
+#   ...
+#
+#   Scan Complete — 48 files checked, 2 intercepted, ~88,707 tokens saved
+#
+# python3 intercept.py --audit
+#
+#   Context Budget Audit
+#   ──────────────────────────────────────────────
+#     Intercepted files:     2
+#     Original token cost:   ~89,107
+#     Summary token cost:    ~400
+#     Total tokens saved:    ~88,707
+#     Compression ratio:     99%