Implement modular Azure Blob Storage cleanup with LLM-assisted triage by Copilot · Pull Request #1 · jwgwalton/DataAtelier

Copilot · 2026-01-28T16:05:38Z

Implements a human-in-the-loop system for cleaning Azure Blob Storage containers using LLM pre-triage with conservative decision gates. All state is tracked in local CSV/JSON files with no external infrastructure required beyond storage account and LLM API.

Architecture

Core modules (pure Python, testable without Jupyter):

config.py - Environment-based configuration with validation
models.py - Data classes (BlobMetadata, TriageDecision, QueueEntry, AuditEntry, FewShotExample)
storage.py - Azure Blob operations (list, download, delete)
extractors.py - Text preview extraction (txt, json, xml, pdf, docx) with robust encoding detection
llm.py - LLM client abstraction supporting OpenAI, Azure OpenAI, and Ollama
triage.py - Conservative decision engine with 3-gate safety system
queue.py - CSV-based queue management (to_review, to_delete, to_keep)
audit.py - Append-only audit logging
policy.py - Policy loading with MD5 versioning and few-shot example management
deletion.py - Safe deletion with 100% coverage verification
cleanup.py - Main orchestrator coordinating all modules

UI layer (Jupyter widgets):

ui.py - ReviewUI for human review, ProgressDisplay for statistics

Conservative LLM Triage

Three safety gates ensure only high-confidence decisions are auto-labeled:

Confidence threshold ≥0.95 required
Self-consistency - 3-5 independent LLM runs must unanimously agree
Never-delete constraints - Blocks deletion of legal/PII/recent files (<7 days), retention folders

Any gate failure → routes to human review queue.

Usage

from dataatelier import BlobCleanup, Config
from dataatelier.ui import ReviewUI

# Initialize and run batch triage
cleanup = BlobCleanup(Config())
manifest = cleanup.create_manifest()
result = cleanup.run_triage_batch(batch_size=20)

# Human review in notebook
review_ui = ReviewUI(cleanup.config)
display(review_ui.create_widget())

Human decisions automatically update few-shot examples and can trigger policy refinement, improving LLM accuracy iteratively.

Testing

Unit tests cover models, queue operations, policy management, and text extraction without requiring Azure or LLM access.

Files

dataatelier/ - 11 modules, ~3000 LOC
tests/ - 4 test modules
azure_blob_cleanup.ipynb - Clean notebook interface
llm_policy.md - Default policy template with conservative rules
.env.example - Configuration template

Original prompt

Below is a lightweight, notebook-first plan to clean an Azure Blob Storage container with a human-in-the-loop process and an LLM working in the background. It avoids deploying external infrastructure beyond your storage account and a single local notebook session. The LLM proposes labels conservatively; only completely certain results become keep/delete, all others go to human_review.

Objectives and constraints

Single Jupyter/VS Code notebook drives the entire process end-to-end.
Local CSV/JSON files track state; no databases or extra Azure services.
LLM runs in the background to pre-triage by content; only promotes items to keep/delete when completely certain, otherwise routes to human_review.
Human decisions update the LLM logic via:
an evolving policy/rubric file (rules-as-prompt)
a small, curated few-shot examples store (positive and negative exemplars)
Minimal bill of materials

Runtime: Python + Jupyter (or VS Code notebooks).
Libraries:
azure-storage-blob (list, read, delete)
pandas (manifests, queues)
ipywidgets (light review UI)
chardet (robust text decoding; optional)
A minimal text extractor for non-plain-text types you need to review (choose only those you have in your data):
txt/json/md/log/xml: built-in decoding
pdf: pdfminer.six (optional)
docx: python-docx (optional)
LLM access:
Option A (simplest): Cloud API (OpenAI/Azure OpenAI) with strict prompt/JSON output, small contexts
Option B (infra-free): Local LLM via Ollama; smaller quality but zero external calls
No additional Azure services (e.g., Search, Functions) required.
Core artifacts (local files)

manifest.csv: inventory of blobs with metadata + short text preview/sample.
queues:
to_review.csv: items awaiting human judgment.
to_delete.csv: items approved for deletion (auto or human).
to_keep.csv: items explicitly kept (auto or human).
llm_policy.md: human-readable rubric that defines what constitutes keep vs delete; updated over time.
few_shot_examples.jsonl: compact set of labeled examples (keep/delete) with reasons; appended by human labeling.
llm_predictions.parquet: cached LLM outputs for blobs (label, confidence, reason, versions of policy/examples used).
audit_log.csv: append-only log of each decision (who/what, when, before/after).
Process flow

Inventory and preview
List blobs (optional prefix, optional max N for pilot).
For text-like files, read the first N KB (e.g., 2–4 KB) to create a preview snippet; store in manifest.csv.
For binaries you care about, optionally extract a small text sample using the minimal extractor for that file type; otherwise rely on filename + human review.
Background LLM triage loop (conservative, continuous)
A background worker cell runs in batches (rate-limited) over unlabeled blobs from manifest.csv.
For each blob:
Build a compact prompt:
The current llm_policy.md (rubric)
4–8 most similar few-shot examples from few_shot_examples.jsonl (balanced across keep/delete)
Metadata (name, path, size, last_modified, content_type)
Preview snippet (strictly limited, e.g., first 1000–2000 chars)
Request strict JSON output: {label: keep|delete|human_review, confidence: 0..1, reason: short}.
Certainty gating (must pass all):
Confidence threshold high (e.g., ≥ 0.95).
Self-consistency: run 3–5 temperature-0 or low-temperature calls with slight paraphrase; all must agree on the same label.
Policy alignment: no rubric violations (e.g., never-delete clauses).
If all gates pass: write to to_keep.csv or to_delete.csv (source=auto, reason=LLM).
Otherwise: to_review.csv (source=auto, reason=LLM uncertain).
Cache results into llm_predictions.parquet to avoid re-calling the LLM for the same blob under the same policy/examples.
Human review and learning
Notebook UI lists items in to_review.csv with:
Metadata
Text preview snippet
Auto reason (if any)
Buttons: keep/delete; optional short reason
On save:
Move row to to_keep.csv or to_delete.csv (source=human).
Append a structured record to few_shot_examples.jsonl with:
label, reason
salient cues (e.g., keywords, phrases, doc sections the human noted)
a small excerpt (or masked excerpt) that justified the decision
Optionally, a helper cell suggests updates to llm_policy.md based on newly labeled examples; user approves edits before committing.
Iterate
After batches of human decisions, the background LLM loop re-runs automatically (or on-demand), now with:
a better few-shot pool
a refined llm_policy.md
Over iterations, the LLM confidently classifies a larger share, shrinking the human_review queue.
Coverage and safeguards
A progress cell reports:
total blobs, labeled count, remaining unlabeled
counts per queue (keep/delete/review)
LLM acceptance rate, human workload, error sampling
No deletions occur until 100% of blobs in manifest.csv are labeled (keep or delete).
Dry-run: Show counts and a sample of delete list for sanity check.
Final deletion (two-step confirmation)
Ensure container/account soft delete is enabled (recommended) for a short retention window.
Step 1: Dry-ru...

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…iage - Implement BlobCleanupConfig with environment variable support - Add BlobInventory for blob listing and text preview extraction - Create LLMTriageEngine with conservative decision making (confidence ≥0.95, self-consistency) - Implement ReviewUI with ipywidgets for human-in-the-loop review - Add DeletionManager for safe deletion with dry-run support - Include QueueManager for CSV-based queue management - Support multiple LLM providers (OpenAI, Azure OpenAI, Ollama) - Extract text from multiple formats (txt, pdf, docx, json, etc.) - Implement few-shot learning from human decisions - Add full audit trail and policy versioning - Include comprehensive examples and documentation Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

- Created storage.py: Azure Blob Storage operations - Created extractors.py: Text extraction with chardet, pdfminer.six, python-docx - Created policy.py: Policy and examples management with MD5 versioning - Created queue.py: CSV-based queue management with statistics - Created audit.py: Audit logging functionality - Created models.py: Data models (BlobMetadata, QueueEntry, AuditEntry, FewShotExample) - Created config.py: Configuration management - Updated __init__.py: Export new modules while maintaining backward compatibility All modules are pure Python functions with: - Comprehensive docstrings - Type hints - Error handling - NO Jupyter dependencies - Fully testable without external services Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

- Created llm.py with LLM client abstraction (OpenAI, Azure OpenAI, Ollama) - Created triage.py with 3-gate safety system (confidence, self-consistency, NEVER DELETE) - Created deletion.py with dry-run and coverage verification - Updated __init__.py to expose new modules - All modules are pure Python with NO Jupyter dependencies - Comprehensive type hints and docstrings throughout Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

- cleanup.py: Main orchestrator class (BlobCleanup) that coordinates all modules - create_manifest(): List blobs and create manifest with previews - run_triage_batch(): Run LLM triage on unlabeled blobs - show_progress(): Display statistics - get_unlabeled_count(): Count unlabeled blobs - archive_artifacts(): Archive all artifacts with metadata - get_next_review_item(): Get next item for human review - Pure Python, NO Jupyter dependencies - ui.py: Jupyter UI components using ipywidgets - ReviewUI: Interactive widget for human review of blobs - ProgressDisplay: Statistics display widget - create_progress_widget(): Convenience function for progress display - Full integration with queue, audit, and policy modules - __init__.py: Updated to expose new modules - tests/test_cleanup_ui.py: Comprehensive tests for both modules Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

Initial plan

bc3690e

Copilot AI assigned Copilot and jwgwalton Jan 28, 2026

Copilot started work on behalf of jwgwalton January 28, 2026 16:06 View session

Copilot AI and others added 11 commits January 28, 2026 16:17

Add implementation summary documentation

05166e5

Add initial implementation (to be refactored)

dee108f

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

Add comprehensive documentation for modular Python modules

5292f11

Add comprehensive usage guide for llm, triage, and deletion modules

5cda916

Add usage guide for cleanup.py and ui.py modules

567abec

Add implementation completion summary

5ba212b

Complete modular Python package implementation with tests

31159cd

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement human-in-the-loop process for Azure Blob Storage cleaning~~ Implement modular Azure Blob Storage cleanup with LLM-assisted triage Jan 28, 2026

Copilot AI requested a review from jwgwalton January 28, 2026 17:01

Copilot finished work on behalf of jwgwalton January 28, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1

Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1
Copilot wants to merge 12 commits intomainfrom
copilot/clean-azure-blob-storage

Copilot AI commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture

Conservative LLM Triage

Usage

Testing

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 28, 2026 •

edited

Loading