Skip to content

feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105

Merged
bordumb merged 6 commits intomainfrom
feat/fn-51-trace-to-test-codify
Jan 31, 2026
Merged

feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105
bordumb merged 6 commits intomainfrom
feat/fn-51-trace-to-test-codify

Conversation

@bordumb
Copy link
Owner

@bordumb bordumb commented Jan 31, 2026

Summary

  • Add dataing codify CLI command and frontend widget to generate regression tests from investigation synthesis
  • Support multiple output formats: Great Expectations, dbt, Soda, SQL
  • Track test adoption and effectiveness for ROI measurement

Changes

Core

  • DataQualityTest model for framework-agnostic test representation
  • Rule-based test extraction from investigation synthesis (null checks, uniqueness, thresholds)
  • Test renderers for GX (JSON), dbt (schema.yml), Soda (SodaCL), SQL formats

CLI (dataing codify)

  • dataing codify <investigation_id> - generate tests to stdout
  • --format option: gx, dbt, soda, sql (default: sql)
  • --output option: write to file
  • --append option: merge with existing files (intelligent merge for dbt/GX)

Frontend

  • "Codify Test" button on completed investigations (>60% confidence)
  • Modal with format selector and syntax-highlighted preview
  • Copy to clipboard and download buttons

API

  • POST /investigations/{id}/codify - generate tests
  • GET /investigations/tests/stats - tracking statistics
  • POST /investigations/tests/adopt - mark test adopted
  • POST /investigations/tests/run - record test results

Analytics

  • Track: tests generated, adopted, run count, failure count
  • Metrics: adoption_rate, effectiveness_rate

Test plan

  • Unit tests for DataQualityTest model
  • Unit tests for test extraction logic
  • Unit tests for all 4 renderers (GX, dbt, Soda, SQL)
  • Unit tests for CLI codify command (11 passing, 1 skipped)
  • Unit tests for test tracking service (11 passing)
  • Frontend type checking passes
  • Pre-commit checks pass (ruff, mypy, eslint)

🤖 Generated with Claude Code

bordumb and others added 6 commits January 31, 2026 10:46
Implement abstract representation of data quality tests that can be
rendered to multiple frameworks (Great Expectations, dbt, Soda, SQL).

- Add AssertionType enum with 8 assertion types: not_null, unique,
  accepted_values, in_range, row_count_change, freshness,
  referential_integrity, custom_sql
- Add ThresholdType enum and AssertionThreshold model for configuring
  test pass/fail thresholds
- Add DataQualityTest model with:
  - test_id, name, description for identification
  - assertion_type, table, column for scope
  - parameters, threshold, sql_expression for configuration
  - severity (warn/fail) for pipeline behavior
  - source_investigation_id, failure_description for traceability
- Add factory methods for each assertion type
- All models are frozen (immutable) for thread safety

Refs: fn-51.1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement extract_tests_from_synthesis() function that maps root cause
patterns to data quality tests:

- NULL pattern → NOT_NULL test on detected column
- Unexpected values → ACCEPTED_VALUES test with extracted values
- Row count changes → ROW_COUNT_CHANGE test (10% threshold)
- Freshness issues → FRESHNESS test (24h threshold)
- Duplicates → UNIQUE test on detected column
- Referential integrity → REFERENTIAL_INTEGRITY test
- Unmatched patterns → CUSTOM_SQL fallback

Features:
- Confidence threshold (0.6) before generating tests
- Column extraction from root cause text patterns
- Accepted values extraction from evidence
- Reference table extraction for FK tests
- Auto-generated tags for traceability

Refs: fn-51.2

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create renderers package with BaseRenderer abstract class
- Add GXRenderer: outputs Great Expectations JSON expectation suite
- Add DbtRenderer: outputs dbt schema.yml test definitions
- Add SodaRenderer: outputs SodaCL check YAML
- Add SQLRenderer: outputs raw SQL assertion queries
- All renderers handle all 8 assertion types from DataQualityTest
- Include provenance comments with investigation ID in all outputs
- Add get_renderer() factory function for format selection
- Add RenderFormat enum for supported formats
- 42 unit tests covering all renderers and edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add `dataing codify` CLI command that extracts testable assertions from
investigation synthesis and renders them to multiple formats (GX, dbt, Soda, SQL).

Features:
- --format option: gx, dbt, soda, sql (default: sql)
- --output option: write to file (default: stdout)
- --append option: merge with existing files (intelligent merging for dbt/GX)
- Error handling for missing synthesis or low confidence investigations

Closes: fn-51.5

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add "Codify Test" button to investigation detail page that opens a modal
with format selection and code preview. Features:
- Format selector (SQL, dbt, Great Expectations, Soda)
- Syntax-highlighted code display
- Copy to clipboard and download buttons
- Test summary badges

Backend:
- POST /investigations/{id}/codify API endpoint
- Confidence validation (>= 60%)
- Integration with renderers package

Closes: fn-51.4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestTrackingService to measure codify ROI by tracking:
- Tests generated per investigation
- Test adoption status (manual confirmation)
- Test run results (pass/fail)
- Issues caught by generated tests

API endpoints:
- GET /investigations/tests/stats - tracking statistics
- GET /investigations/tests/catches - recent failures caught
- POST /investigations/tests/adopt - mark test adopted
- POST /investigations/tests/run - record test result

Includes database migration and unit tests.

Closes: fn-51.6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Jan 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dataing Ready Ready Preview, Comment Jan 31, 2026 3:37pm
dataing-app Ready Ready Preview, Comment Jan 31, 2026 3:37pm
dataing-docs Ready Ready Preview, Comment Jan 31, 2026 3:37pm

@bordumb bordumb merged commit 2400676 into main Jan 31, 2026
6 checks passed
@bordumb bordumb deleted the feat/fn-51-trace-to-test-codify branch January 31, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant