Skip to content

Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1

Draft
Copilot wants to merge 12 commits intomainfrom
copilot/clean-azure-blob-storage
Draft

Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1
Copilot wants to merge 12 commits intomainfrom
copilot/clean-azure-blob-storage

Conversation

Copy link
Contributor

Copilot AI commented Jan 28, 2026

Implements a human-in-the-loop system for cleaning Azure Blob Storage containers using LLM pre-triage with conservative decision gates. All state is tracked in local CSV/JSON files with no external infrastructure required beyond storage account and LLM API.

Architecture

Core modules (pure Python, testable without Jupyter):

  • config.py - Environment-based configuration with validation
  • models.py - Data classes (BlobMetadata, TriageDecision, QueueEntry, AuditEntry, FewShotExample)
  • storage.py - Azure Blob operations (list, download, delete)
  • extractors.py - Text preview extraction (txt, json, xml, pdf, docx) with robust encoding detection
  • llm.py - LLM client abstraction supporting OpenAI, Azure OpenAI, and Ollama
  • triage.py - Conservative decision engine with 3-gate safety system
  • queue.py - CSV-based queue management (to_review, to_delete, to_keep)
  • audit.py - Append-only audit logging
  • policy.py - Policy loading with MD5 versioning and few-shot example management
  • deletion.py - Safe deletion with 100% coverage verification
  • cleanup.py - Main orchestrator coordinating all modules

UI layer (Jupyter widgets):

  • ui.py - ReviewUI for human review, ProgressDisplay for statistics

Conservative LLM Triage

Three safety gates ensure only high-confidence decisions are auto-labeled:

  1. Confidence threshold ≥0.95 required
  2. Self-consistency - 3-5 independent LLM runs must unanimously agree
  3. Never-delete constraints - Blocks deletion of legal/PII/recent files (<7 days), retention folders

Any gate failure → routes to human review queue.

Usage

from dataatelier import BlobCleanup, Config
from dataatelier.ui import ReviewUI

# Initialize and run batch triage
cleanup = BlobCleanup(Config())
manifest = cleanup.create_manifest()
result = cleanup.run_triage_batch(batch_size=20)

# Human review in notebook
review_ui = ReviewUI(cleanup.config)
display(review_ui.create_widget())

Human decisions automatically update few-shot examples and can trigger policy refinement, improving LLM accuracy iteratively.

Testing

Unit tests cover models, queue operations, policy management, and text extraction without requiring Azure or LLM access.

Files

  • dataatelier/ - 11 modules, ~3000 LOC
  • tests/ - 4 test modules
  • azure_blob_cleanup.ipynb - Clean notebook interface
  • llm_policy.md - Default policy template with conservative rules
  • .env.example - Configuration template
Original prompt

Below is a lightweight, notebook-first plan to clean an Azure Blob Storage container with a human-in-the-loop process and an LLM working in the background. It avoids deploying external infrastructure beyond your storage account and a single local notebook session. The LLM proposes labels conservatively; only completely certain results become keep/delete, all others go to human_review.

Objectives and constraints

Single Jupyter/VS Code notebook drives the entire process end-to-end.
Local CSV/JSON files track state; no databases or extra Azure services.
LLM runs in the background to pre-triage by content; only promotes items to keep/delete when completely certain, otherwise routes to human_review.
Human decisions update the LLM logic via:
an evolving policy/rubric file (rules-as-prompt)
a small, curated few-shot examples store (positive and negative exemplars)
Minimal bill of materials

Runtime: Python + Jupyter (or VS Code notebooks).
Libraries:
azure-storage-blob (list, read, delete)
pandas (manifests, queues)
ipywidgets (light review UI)
chardet (robust text decoding; optional)
A minimal text extractor for non-plain-text types you need to review (choose only those you have in your data):
txt/json/md/log/xml: built-in decoding
pdf: pdfminer.six (optional)
docx: python-docx (optional)
LLM access:
Option A (simplest): Cloud API (OpenAI/Azure OpenAI) with strict prompt/JSON output, small contexts
Option B (infra-free): Local LLM via Ollama; smaller quality but zero external calls
No additional Azure services (e.g., Search, Functions) required.
Core artifacts (local files)

manifest.csv: inventory of blobs with metadata + short text preview/sample.
queues:
to_review.csv: items awaiting human judgment.
to_delete.csv: items approved for deletion (auto or human).
to_keep.csv: items explicitly kept (auto or human).
llm_policy.md: human-readable rubric that defines what constitutes keep vs delete; updated over time.
few_shot_examples.jsonl: compact set of labeled examples (keep/delete) with reasons; appended by human labeling.
llm_predictions.parquet: cached LLM outputs for blobs (label, confidence, reason, versions of policy/examples used).
audit_log.csv: append-only log of each decision (who/what, when, before/after).
Process flow

Inventory and preview
List blobs (optional prefix, optional max N for pilot).
For text-like files, read the first N KB (e.g., 2–4 KB) to create a preview snippet; store in manifest.csv.
For binaries you care about, optionally extract a small text sample using the minimal extractor for that file type; otherwise rely on filename + human review.
Background LLM triage loop (conservative, continuous)
A background worker cell runs in batches (rate-limited) over unlabeled blobs from manifest.csv.
For each blob:
Build a compact prompt:
The current llm_policy.md (rubric)
4–8 most similar few-shot examples from few_shot_examples.jsonl (balanced across keep/delete)
Metadata (name, path, size, last_modified, content_type)
Preview snippet (strictly limited, e.g., first 1000–2000 chars)
Request strict JSON output: {label: keep|delete|human_review, confidence: 0..1, reason: short}.
Certainty gating (must pass all):
Confidence threshold high (e.g., ≥ 0.95).
Self-consistency: run 3–5 temperature-0 or low-temperature calls with slight paraphrase; all must agree on the same label.
Policy alignment: no rubric violations (e.g., never-delete clauses).
If all gates pass: write to to_keep.csv or to_delete.csv (source=auto, reason=LLM).
Otherwise: to_review.csv (source=auto, reason=LLM uncertain).
Cache results into llm_predictions.parquet to avoid re-calling the LLM for the same blob under the same policy/examples.
Human review and learning
Notebook UI lists items in to_review.csv with:
Metadata
Text preview snippet
Auto reason (if any)
Buttons: keep/delete; optional short reason
On save:
Move row to to_keep.csv or to_delete.csv (source=human).
Append a structured record to few_shot_examples.jsonl with:
label, reason
salient cues (e.g., keywords, phrases, doc sections the human noted)
a small excerpt (or masked excerpt) that justified the decision
Optionally, a helper cell suggests updates to llm_policy.md based on newly labeled examples; user approves edits before committing.
Iterate
After batches of human decisions, the background LLM loop re-runs automatically (or on-demand), now with:
a better few-shot pool
a refined llm_policy.md
Over iterations, the LLM confidently classifies a larger share, shrinking the human_review queue.
Coverage and safeguards
A progress cell reports:
total blobs, labeled count, remaining unlabeled
counts per queue (keep/delete/review)
LLM acceptance rate, human workload, error sampling
No deletions occur until 100% of blobs in manifest.csv are labeled (keep or delete).
Dry-run: Show counts and a sample of delete list for sanity check.
Final deletion (two-step confirmation)
Ensure container/account soft delete is enabled (recommended) for a short retention window.
Step 1: Dry-ru...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 11 commits January 28, 2026 16:17
…iage

- Implement BlobCleanupConfig with environment variable support
- Add BlobInventory for blob listing and text preview extraction
- Create LLMTriageEngine with conservative decision making (confidence ≥0.95, self-consistency)
- Implement ReviewUI with ipywidgets for human-in-the-loop review
- Add DeletionManager for safe deletion with dry-run support
- Include QueueManager for CSV-based queue management
- Support multiple LLM providers (OpenAI, Azure OpenAI, Ollama)
- Extract text from multiple formats (txt, pdf, docx, json, etc.)
- Implement few-shot learning from human decisions
- Add full audit trail and policy versioning
- Include comprehensive examples and documentation

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- Created storage.py: Azure Blob Storage operations
- Created extractors.py: Text extraction with chardet, pdfminer.six, python-docx
- Created policy.py: Policy and examples management with MD5 versioning
- Created queue.py: CSV-based queue management with statistics
- Created audit.py: Audit logging functionality
- Created models.py: Data models (BlobMetadata, QueueEntry, AuditEntry, FewShotExample)
- Created config.py: Configuration management
- Updated __init__.py: Export new modules while maintaining backward compatibility

All modules are pure Python functions with:
- Comprehensive docstrings
- Type hints
- Error handling
- NO Jupyter dependencies
- Fully testable without external services

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- Created llm.py with LLM client abstraction (OpenAI, Azure OpenAI, Ollama)
- Created triage.py with 3-gate safety system (confidence, self-consistency, NEVER DELETE)
- Created deletion.py with dry-run and coverage verification
- Updated __init__.py to expose new modules
- All modules are pure Python with NO Jupyter dependencies
- Comprehensive type hints and docstrings throughout

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- cleanup.py: Main orchestrator class (BlobCleanup) that coordinates all modules
  - create_manifest(): List blobs and create manifest with previews
  - run_triage_batch(): Run LLM triage on unlabeled blobs
  - show_progress(): Display statistics
  - get_unlabeled_count(): Count unlabeled blobs
  - archive_artifacts(): Archive all artifacts with metadata
  - get_next_review_item(): Get next item for human review
  - Pure Python, NO Jupyter dependencies

- ui.py: Jupyter UI components using ipywidgets
  - ReviewUI: Interactive widget for human review of blobs
  - ProgressDisplay: Statistics display widget
  - create_progress_widget(): Convenience function for progress display
  - Full integration with queue, audit, and policy modules

- __init__.py: Updated to expose new modules

- tests/test_cleanup_ui.py: Comprehensive tests for both modules

Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement human-in-the-loop process for Azure Blob Storage cleaning Implement modular Azure Blob Storage cleanup with LLM-assisted triage Jan 28, 2026
Copilot AI requested a review from jwgwalton January 28, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants