Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 971 Bytes

File metadata and controls

39 lines (29 loc) · 971 Bytes

Evaluation Framework

safedata evaluates data-filtering systems on four dimensions:

  1. Safety efficacy
  • Toxic leakage rate
  • PII leakage rate
  • Poison recall at low false-positive rates
  1. Calibration quality
  • Expected calibration error (ECE)
  • Reliability diagrams
  • Threshold stability under distribution shift
  1. Fairness impact
  • Demographic parity difference
  • Equal opportunity difference
  • Counterfactual prediction shift by demographic term swaps
  1. Robustness under attack
  • Accuracy/recall drop under evasion transformations
  • Detector bypass rate for backdoor/label-flip perturbations

Protocol

  • Evaluate on clean and adversarially transformed variants.
  • Report aggregate and slice-wise metrics.
  • Include negative results and explicit failure examples.
  • Treat calibration as required for policy thresholds.

Reporting template

  • Research question
  • Experimental setup
  • Main findings
  • Negative results
  • What I learned
  • Open questions