Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1
Draft
Implement modular Azure Blob Storage cleanup with LLM-assisted triage#1
Conversation
…iage - Implement BlobCleanupConfig with environment variable support - Add BlobInventory for blob listing and text preview extraction - Create LLMTriageEngine with conservative decision making (confidence ≥0.95, self-consistency) - Implement ReviewUI with ipywidgets for human-in-the-loop review - Add DeletionManager for safe deletion with dry-run support - Include QueueManager for CSV-based queue management - Support multiple LLM providers (OpenAI, Azure OpenAI, Ollama) - Extract text from multiple formats (txt, pdf, docx, json, etc.) - Implement few-shot learning from human decisions - Add full audit trail and policy versioning - Include comprehensive examples and documentation Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- Created storage.py: Azure Blob Storage operations - Created extractors.py: Text extraction with chardet, pdfminer.six, python-docx - Created policy.py: Policy and examples management with MD5 versioning - Created queue.py: CSV-based queue management with statistics - Created audit.py: Audit logging functionality - Created models.py: Data models (BlobMetadata, QueueEntry, AuditEntry, FewShotExample) - Created config.py: Configuration management - Updated __init__.py: Export new modules while maintaining backward compatibility All modules are pure Python functions with: - Comprehensive docstrings - Type hints - Error handling - NO Jupyter dependencies - Fully testable without external services Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- Created llm.py with LLM client abstraction (OpenAI, Azure OpenAI, Ollama) - Created triage.py with 3-gate safety system (confidence, self-consistency, NEVER DELETE) - Created deletion.py with dry-run and coverage verification - Updated __init__.py to expose new modules - All modules are pure Python with NO Jupyter dependencies - Comprehensive type hints and docstrings throughout Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
- cleanup.py: Main orchestrator class (BlobCleanup) that coordinates all modules - create_manifest(): List blobs and create manifest with previews - run_triage_batch(): Run LLM triage on unlabeled blobs - show_progress(): Display statistics - get_unlabeled_count(): Count unlabeled blobs - archive_artifacts(): Archive all artifacts with metadata - get_next_review_item(): Get next item for human review - Pure Python, NO Jupyter dependencies - ui.py: Jupyter UI components using ipywidgets - ReviewUI: Interactive widget for human review of blobs - ProgressDisplay: Statistics display widget - create_progress_widget(): Convenience function for progress display - Full integration with queue, audit, and policy modules - __init__.py: Updated to expose new modules - tests/test_cleanup_ui.py: Comprehensive tests for both modules Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Co-authored-by: jwgwalton <7936236+jwgwalton@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Implement human-in-the-loop process for Azure Blob Storage cleaning
Implement modular Azure Blob Storage cleanup with LLM-assisted triage
Jan 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements a human-in-the-loop system for cleaning Azure Blob Storage containers using LLM pre-triage with conservative decision gates. All state is tracked in local CSV/JSON files with no external infrastructure required beyond storage account and LLM API.
Architecture
Core modules (pure Python, testable without Jupyter):
config.py- Environment-based configuration with validationmodels.py- Data classes (BlobMetadata, TriageDecision, QueueEntry, AuditEntry, FewShotExample)storage.py- Azure Blob operations (list, download, delete)extractors.py- Text preview extraction (txt, json, xml, pdf, docx) with robust encoding detectionllm.py- LLM client abstraction supporting OpenAI, Azure OpenAI, and Ollamatriage.py- Conservative decision engine with 3-gate safety systemqueue.py- CSV-based queue management (to_review, to_delete, to_keep)audit.py- Append-only audit loggingpolicy.py- Policy loading with MD5 versioning and few-shot example managementdeletion.py- Safe deletion with 100% coverage verificationcleanup.py- Main orchestrator coordinating all modulesUI layer (Jupyter widgets):
ui.py- ReviewUI for human review, ProgressDisplay for statisticsConservative LLM Triage
Three safety gates ensure only high-confidence decisions are auto-labeled:
Any gate failure → routes to human review queue.
Usage
Human decisions automatically update few-shot examples and can trigger policy refinement, improving LLM accuracy iteratively.
Testing
Unit tests cover models, queue operations, policy management, and text extraction without requiring Azure or LLM access.
Files
dataatelier/- 11 modules, ~3000 LOCtests/- 4 test modulesazure_blob_cleanup.ipynb- Clean notebook interfacellm_policy.md- Default policy template with conservative rules.env.example- Configuration templateOriginal prompt
Below is a lightweight, notebook-first plan to clean an Azure Blob Storage container with a human-in-the-loop process and an LLM working in the background. It avoids deploying external infrastructure beyond your storage account and a single local notebook session. The LLM proposes labels conservatively; only completely certain results become keep/delete, all others go to human_review.
Objectives and constraints
Single Jupyter/VS Code notebook drives the entire process end-to-end.
Local CSV/JSON files track state; no databases or extra Azure services.
LLM runs in the background to pre-triage by content; only promotes items to keep/delete when completely certain, otherwise routes to human_review.
Human decisions update the LLM logic via:
an evolving policy/rubric file (rules-as-prompt)
a small, curated few-shot examples store (positive and negative exemplars)
Minimal bill of materials
Runtime: Python + Jupyter (or VS Code notebooks).
Libraries:
azure-storage-blob (list, read, delete)
pandas (manifests, queues)
ipywidgets (light review UI)
chardet (robust text decoding; optional)
A minimal text extractor for non-plain-text types you need to review (choose only those you have in your data):
txt/json/md/log/xml: built-in decoding
pdf: pdfminer.six (optional)
docx: python-docx (optional)
LLM access:
Option A (simplest): Cloud API (OpenAI/Azure OpenAI) with strict prompt/JSON output, small contexts
Option B (infra-free): Local LLM via Ollama; smaller quality but zero external calls
No additional Azure services (e.g., Search, Functions) required.
Core artifacts (local files)
manifest.csv: inventory of blobs with metadata + short text preview/sample.
queues:
to_review.csv: items awaiting human judgment.
to_delete.csv: items approved for deletion (auto or human).
to_keep.csv: items explicitly kept (auto or human).
llm_policy.md: human-readable rubric that defines what constitutes keep vs delete; updated over time.
few_shot_examples.jsonl: compact set of labeled examples (keep/delete) with reasons; appended by human labeling.
llm_predictions.parquet: cached LLM outputs for blobs (label, confidence, reason, versions of policy/examples used).
audit_log.csv: append-only log of each decision (who/what, when, before/after).
Process flow
Inventory and preview
List blobs (optional prefix, optional max N for pilot).
For text-like files, read the first N KB (e.g., 2–4 KB) to create a preview snippet; store in manifest.csv.
For binaries you care about, optionally extract a small text sample using the minimal extractor for that file type; otherwise rely on filename + human review.
Background LLM triage loop (conservative, continuous)
A background worker cell runs in batches (rate-limited) over unlabeled blobs from manifest.csv.
For each blob:
Build a compact prompt:
The current llm_policy.md (rubric)
4–8 most similar few-shot examples from few_shot_examples.jsonl (balanced across keep/delete)
Metadata (name, path, size, last_modified, content_type)
Preview snippet (strictly limited, e.g., first 1000–2000 chars)
Request strict JSON output: {label: keep|delete|human_review, confidence: 0..1, reason: short}.
Certainty gating (must pass all):
Confidence threshold high (e.g., ≥ 0.95).
Self-consistency: run 3–5 temperature-0 or low-temperature calls with slight paraphrase; all must agree on the same label.
Policy alignment: no rubric violations (e.g., never-delete clauses).
If all gates pass: write to to_keep.csv or to_delete.csv (source=auto, reason=LLM).
Otherwise: to_review.csv (source=auto, reason=LLM uncertain).
Cache results into llm_predictions.parquet to avoid re-calling the LLM for the same blob under the same policy/examples.
Human review and learning
Notebook UI lists items in to_review.csv with:
Metadata
Text preview snippet
Auto reason (if any)
Buttons: keep/delete; optional short reason
On save:
Move row to to_keep.csv or to_delete.csv (source=human).
Append a structured record to few_shot_examples.jsonl with:
label, reason
salient cues (e.g., keywords, phrases, doc sections the human noted)
a small excerpt (or masked excerpt) that justified the decision
Optionally, a helper cell suggests updates to llm_policy.md based on newly labeled examples; user approves edits before committing.
Iterate
After batches of human decisions, the background LLM loop re-runs automatically (or on-demand), now with:
a better few-shot pool
a refined llm_policy.md
Over iterations, the LLM confidently classifies a larger share, shrinking the human_review queue.
Coverage and safeguards
A progress cell reports:
total blobs, labeled count, remaining unlabeled
counts per queue (keep/delete/review)
LLM acceptance rate, human workload, error sampling
No deletions occur until 100% of blobs in manifest.csv are labeled (keep or delete).
Dry-run: Show counts and a sample of delete list for sanity check.
Final deletion (two-step confirmation)
Ensure container/account soft delete is enabled (recommended) for a short retention window.
Step 1: Dry-ru...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.