feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105
Merged
feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105
Conversation
Implement abstract representation of data quality tests that can be rendered to multiple frameworks (Great Expectations, dbt, Soda, SQL). - Add AssertionType enum with 8 assertion types: not_null, unique, accepted_values, in_range, row_count_change, freshness, referential_integrity, custom_sql - Add ThresholdType enum and AssertionThreshold model for configuring test pass/fail thresholds - Add DataQualityTest model with: - test_id, name, description for identification - assertion_type, table, column for scope - parameters, threshold, sql_expression for configuration - severity (warn/fail) for pipeline behavior - source_investigation_id, failure_description for traceability - Add factory methods for each assertion type - All models are frozen (immutable) for thread safety Refs: fn-51.1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement extract_tests_from_synthesis() function that maps root cause patterns to data quality tests: - NULL pattern → NOT_NULL test on detected column - Unexpected values → ACCEPTED_VALUES test with extracted values - Row count changes → ROW_COUNT_CHANGE test (10% threshold) - Freshness issues → FRESHNESS test (24h threshold) - Duplicates → UNIQUE test on detected column - Referential integrity → REFERENTIAL_INTEGRITY test - Unmatched patterns → CUSTOM_SQL fallback Features: - Confidence threshold (0.6) before generating tests - Column extraction from root cause text patterns - Accepted values extraction from evidence - Reference table extraction for FK tests - Auto-generated tags for traceability Refs: fn-51.2 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create renderers package with BaseRenderer abstract class - Add GXRenderer: outputs Great Expectations JSON expectation suite - Add DbtRenderer: outputs dbt schema.yml test definitions - Add SodaRenderer: outputs SodaCL check YAML - Add SQLRenderer: outputs raw SQL assertion queries - All renderers handle all 8 assertion types from DataQualityTest - Include provenance comments with investigation ID in all outputs - Add get_renderer() factory function for format selection - Add RenderFormat enum for supported formats - 42 unit tests covering all renderers and edge cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add `dataing codify` CLI command that extracts testable assertions from investigation synthesis and renders them to multiple formats (GX, dbt, Soda, SQL). Features: - --format option: gx, dbt, soda, sql (default: sql) - --output option: write to file (default: stdout) - --append option: merge with existing files (intelligent merging for dbt/GX) - Error handling for missing synthesis or low confidence investigations Closes: fn-51.5 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add "Codify Test" button to investigation detail page that opens a modal
with format selection and code preview. Features:
- Format selector (SQL, dbt, Great Expectations, Soda)
- Syntax-highlighted code display
- Copy to clipboard and download buttons
- Test summary badges
Backend:
- POST /investigations/{id}/codify API endpoint
- Confidence validation (>= 60%)
- Integration with renderers package
Closes: fn-51.4
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestTrackingService to measure codify ROI by tracking: - Tests generated per investigation - Test adoption status (manual confirmation) - Test run results (pass/fail) - Issues caught by generated tests API endpoints: - GET /investigations/tests/stats - tracking statistics - GET /investigations/tests/catches - recent failures caught - POST /investigations/tests/adopt - mark test adopted - POST /investigations/tests/run - record test result Includes database migration and unit tests. Closes: fn-51.6 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dataing codifyCLI command and frontend widget to generate regression tests from investigation synthesisChanges
Core
DataQualityTestmodel for framework-agnostic test representationCLI (
dataing codify)dataing codify <investigation_id>- generate tests to stdout--formatoption: gx, dbt, soda, sql (default: sql)--outputoption: write to file--appendoption: merge with existing files (intelligent merge for dbt/GX)Frontend
API
POST /investigations/{id}/codify- generate testsGET /investigations/tests/stats- tracking statisticsPOST /investigations/tests/adopt- mark test adoptedPOST /investigations/tests/run- record test resultsAnalytics
Test plan
🤖 Generated with Claude Code