safedata evaluates data-filtering systems on four dimensions:
- Safety efficacy
- Toxic leakage rate
- PII leakage rate
- Poison recall at low false-positive rates
- Calibration quality
- Expected calibration error (ECE)
- Reliability diagrams
- Threshold stability under distribution shift
- Fairness impact
- Demographic parity difference
- Equal opportunity difference
- Counterfactual prediction shift by demographic term swaps
- Robustness under attack
- Accuracy/recall drop under evasion transformations
- Detector bypass rate for backdoor/label-flip perturbations
- Evaluate on clean and adversarially transformed variants.
- Report aggregate and slice-wise metrics.
- Include negative results and explicit failure examples.
- Treat calibration as required for policy thresholds.
- Research question
- Experimental setup
- Main findings
- Negative results
- What I learned
- Open questions