Skip to content

feat: Investigation Snapshot Export and Import (fn-39)#107

Merged
bordumb merged 6 commits intomainfrom
feat/fn-39-investigation-snapshot
Jan 31, 2026
Merged

feat: Investigation Snapshot Export and Import (fn-39)#107
bordumb merged 6 commits intomainfrom
feat/fn-39-investigation-snapshot

Conversation

@bordumb
Copy link
Owner

@bordumb bordumb commented Jan 31, 2026

Summary

  • Add portable .tar.gz snapshot archives for investigation export/import
  • Enable cross-environment investigation sharing via CLI, SDK, and API
  • Support schema versioning (v1.0) for forward compatibility with optional Parquet/JSON evidence storage

Changes

Core

  • snapshot_schema.py - Schema definitions, metadata validation, archive paths
  • snapshot_builder.py - Builds tar.gz archives with evidence, lineage, metadata
  • snapshot_importer.py - Validates and imports archives as replay investigations

API

  • GET /investigations/{id}/snapshot - Stream download tar.gz archive
  • POST /investigations/import - Upload and import archive

CLI

  • dataing run snapshot <id> - Export investigation to local file
  • dataing run import <file> - Import archive as replay investigation

SDK

  • download_snapshot() / import_snapshot() methods

Test plan

  • Unit tests for schema, builder, importer, CLI (177 passed)

bordumb and others added 6 commits January 31, 2026 17:31
Define canonical snapshot archive structure and metadata.json schema:
- SNAPSHOT_SCHEMA_VERSION constant (1.0)
- SnapshotMetadata Pydantic model with all required fields
- ArchivePaths helper with well-known archive paths
- validate_metadata() for import validation
- json_schema() for external tooling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement SnapshotBuilder class that creates tar.gz archives:
- Evidence items serialized as numbered JSON files
- Query results stored as Parquet when pyarrow is available
- Graceful JSON fallback when pyarrow not installed
- metadata.json with complete file inventory and schema_version
- SnapshotSizeExceededError when archive exceeds max_size_bytes
- pyarrow added as optional dependency under [snapshot] extra

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add API endpoints for snapshot archive export and import:
- GET /investigations/{id}/snapshot returns tar.gz with correct headers
- POST /investigations/import accepts tar.gz upload
- Validates metadata.json schema version on import
- Imported investigations marked with is_replay: true
- 404 for non-existent investigations
- 413 for files exceeding max_size_bytes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement SnapshotImporter class for importing snapshot archives:
- validate_archive() validates schema version and path traversal
- extract_evidence() extracts all evidence items
- extract_prompts() extracts prompt templates
- import_investigation() returns ImportResult with replay status
- Supports Parquet reading when pyarrow available
- Added InvalidSnapshotError, UnsupportedSchemaVersionError
- Added SUPPORTED_SCHEMA_VERSIONS set for forward compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add CLI commands for snapshot export and import:
- `dataing run snapshot <id>` downloads tar.gz archive
- `dataing run import <file>` uploads and imports archive
- --output flag for custom output path
- --max-size flag for size limit in MB
- Rich-formatted progress and results display

SDK methods added:
- download_snapshot(investigation_id) -> bytes
- import_snapshot(file_data) -> dict

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add unit tests for CLI snapshot commands:
- TestRunSnapshotCommand with 4 tests
- TestRunImportCommand with 3 tests
- Tests cover success, error handling, max-size limits, not found

Note: Core module tests were already added in previous tasks
(test_snapshot_schema.py, test_snapshot_builder.py, test_snapshot_importer.py)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@bordumb bordumb self-assigned this Jan 31, 2026
@vercel
Copy link

vercel bot commented Jan 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dataing Ready Ready Preview, Comment Jan 31, 2026 4:55pm
dataing-app Ready Ready Preview, Comment Jan 31, 2026 4:55pm
dataing-docs Ready Ready Preview, Comment Jan 31, 2026 4:55pm

@bordumb bordumb merged commit 6bf5bab into main Jan 31, 2026
5 checks passed
@bordumb bordumb deleted the feat/fn-39-investigation-snapshot branch January 31, 2026 23:53
bordumb pushed a commit that referenced this pull request Feb 1, 2026
# [1.18.0](v1.17.0...v1.18.0) (2026-01-31)

### Features

* Investigation Snapshot Export and Import (fn-39) ([#107](#107)) ([6bf5bab](6bf5bab))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant