feat: Trace-to-Test Codify - Generate Regression Tests from Investigations by bordumb · Pull Request #105 · bordumb/dataing

bordumb · 2026-01-31T15:36:27Z

Summary

Add dataing codify CLI command and frontend widget to generate regression tests from investigation synthesis
Support multiple output formats: Great Expectations, dbt, Soda, SQL
Track test adoption and effectiveness for ROI measurement

Changes

Core

DataQualityTest model for framework-agnostic test representation
Rule-based test extraction from investigation synthesis (null checks, uniqueness, thresholds)
Test renderers for GX (JSON), dbt (schema.yml), Soda (SodaCL), SQL formats

CLI (`dataing codify`)

dataing codify <investigation_id> - generate tests to stdout
--format option: gx, dbt, soda, sql (default: sql)
--output option: write to file
--append option: merge with existing files (intelligent merge for dbt/GX)

Frontend

"Codify Test" button on completed investigations (>60% confidence)
Modal with format selector and syntax-highlighted preview
Copy to clipboard and download buttons

API

POST /investigations/{id}/codify - generate tests
GET /investigations/tests/stats - tracking statistics
POST /investigations/tests/adopt - mark test adopted
POST /investigations/tests/run - record test results

Analytics

Track: tests generated, adopted, run count, failure count
Metrics: adoption_rate, effectiveness_rate

Test plan

Unit tests for DataQualityTest model
Unit tests for test extraction logic
Unit tests for all 4 renderers (GX, dbt, Soda, SQL)
Unit tests for CLI codify command (11 passing, 1 skipped)
Unit tests for test tracking service (11 passing)
Frontend type checking passes
Pre-commit checks pass (ruff, mypy, eslint)

🤖 Generated with Claude Code

Implement abstract representation of data quality tests that can be rendered to multiple frameworks (Great Expectations, dbt, Soda, SQL). - Add AssertionType enum with 8 assertion types: not_null, unique, accepted_values, in_range, row_count_change, freshness, referential_integrity, custom_sql - Add ThresholdType enum and AssertionThreshold model for configuring test pass/fail thresholds - Add DataQualityTest model with: - test_id, name, description for identification - assertion_type, table, column for scope - parameters, threshold, sql_expression for configuration - severity (warn/fail) for pipeline behavior - source_investigation_id, failure_description for traceability - Add factory methods for each assertion type - All models are frozen (immutable) for thread safety Refs: fn-51.1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement extract_tests_from_synthesis() function that maps root cause patterns to data quality tests: - NULL pattern → NOT_NULL test on detected column - Unexpected values → ACCEPTED_VALUES test with extracted values - Row count changes → ROW_COUNT_CHANGE test (10% threshold) - Freshness issues → FRESHNESS test (24h threshold) - Duplicates → UNIQUE test on detected column - Referential integrity → REFERENTIAL_INTEGRITY test - Unmatched patterns → CUSTOM_SQL fallback Features: - Confidence threshold (0.6) before generating tests - Column extraction from root cause text patterns - Accepted values extraction from evidence - Reference table extraction for FK tests - Auto-generated tags for traceability Refs: fn-51.2 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create renderers package with BaseRenderer abstract class - Add GXRenderer: outputs Great Expectations JSON expectation suite - Add DbtRenderer: outputs dbt schema.yml test definitions - Add SodaRenderer: outputs SodaCL check YAML - Add SQLRenderer: outputs raw SQL assertion queries - All renderers handle all 8 assertion types from DataQualityTest - Include provenance comments with investigation ID in all outputs - Add get_renderer() factory function for format selection - Add RenderFormat enum for supported formats - 42 unit tests covering all renderers and edge cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add `dataing codify` CLI command that extracts testable assertions from investigation synthesis and renders them to multiple formats (GX, dbt, Soda, SQL). Features: - --format option: gx, dbt, soda, sql (default: sql) - --output option: write to file (default: stdout) - --append option: merge with existing files (intelligent merging for dbt/GX) - Error handling for missing synthesis or low confidence investigations Closes: fn-51.5 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add "Codify Test" button to investigation detail page that opens a modal with format selection and code preview. Features: - Format selector (SQL, dbt, Great Expectations, Soda) - Syntax-highlighted code display - Copy to clipboard and download buttons - Test summary badges Backend: - POST /investigations/{id}/codify API endpoint - Confidence validation (>= 60%) - Integration with renderers package Closes: fn-51.4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add TestTrackingService to measure codify ROI by tracking: - Tests generated per investigation - Test adoption status (manual confirmation) - Test run results (pass/fail) - Issues caught by generated tests API endpoints: - GET /investigations/tests/stats - tracking statistics - GET /investigations/tests/catches - recent failures caught - POST /investigations/tests/adopt - mark test adopted - POST /investigations/tests/run - record test result Includes database migration and unit tests. Closes: fn-51.6 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel · 2026-01-31T15:36:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
dataing	Ready	Preview, Comment	Jan 31, 2026 3:37pm
dataing-app	Ready	Preview, Comment	Jan 31, 2026 3:37pm
dataing-docs	Ready	Preview, Comment	Jan 31, 2026 3:37pm

bordumb and others added 6 commits January 31, 2026 10:46

vercel bot deployed to Preview – dataing-docs January 31, 2026 15:36 View deployment

vercel bot deployed to Preview – dataing-app January 31, 2026 15:36 View deployment

vercel bot deployed to Preview – dataing January 31, 2026 15:37 View deployment

bordumb merged commit 2400676 into main Jan 31, 2026
6 checks passed

bordumb deleted the feat/fn-51-trace-to-test-codify branch January 31, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105

feat: Trace-to-Test Codify - Generate Regression Tests from Investigations#105
bordumb merged 6 commits intomainfrom
feat/fn-51-trace-to-test-codify

bordumb commented Jan 31, 2026

Uh oh!

vercel bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bordumb commented Jan 31, 2026

Summary

Changes

Core

CLI (dataing codify)

Frontend

API

Analytics

Test plan

Uh oh!

vercel bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CLI (`dataing codify`)

vercel bot commented Jan 31, 2026 •

edited

Loading