Skip to content

Fix switch/flip error rate calculations#131

Open
JosephLalli wants to merge 1 commit intoodelaneau:mainfrom
JosephLalli:fix-flip-switch-tracking
Open

Fix switch/flip error rate calculations#131
JosephLalli wants to merge 1 commit intoodelaneau:mainfrom
JosephLalli:fix-flip-switch-tracking

Conversation

@JosephLalli
Copy link
Copy Markdown

@JosephLalli JosephLalli commented Mar 8, 2026

The previous code counted a string of errors "0 1 1 1 0" as two flips, two switch errors, and one correct variant. For researchers who want to examine the relative rate of flip events and switch errors unrelated to flips, this is unhelpful.

Instead, I would argue this represents one flip event, three switch errors, two correct variants, and zero "pure switch errors" (ie isolated switch errors that are not associated with a flip error).

This code change calculates flip events, switch errors, pure switch errors, and correct variants. It also calculated how many flip errors occur consecutively (ie, more than two errors in sequence).

Re-done post rebase.

The previous code counted a string of errors "0 1 1 1 0" as two flips, two switch errors, and one correct variant. For researchers who want to examine the relative rate of flip events and switch errors unrelated to flips, this is unhelpful.

Instead, I would argue this represents one flip event, three switch errors, two correct variants, and zero "pure switch errors" (ie isolated switch errors that are not associated with a flip error).

This code change calculates flip events, switch errors, pure switch errors, and correct variants. It also calculated how many flip errors occur consecutively (ie, more than two errors in sequence).
Copilot AI review requested due to automatic review settings March 8, 2026 04:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates haplotype_checker’s per-sample reporting to distinguish flip events from switch errors (including “pure” switch errors) to better support downstream analysis of phasing error patterns.

Changes:

  • Reworks writeFlipSwitchErrorPerSample() to compute flip events, total switch errors, pure switch errors, and correct variants.
  • Adds tracking for “consecutive flip” runs and prints new overall summary rates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +135 to +145
int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;
int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
whole_study_errors += n_all_switches;
whole_study_true_errors += n_pure_switches;
whole_study_checked += total;

fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;
}
fdo.close();
vrb.bullet("#Overall switch error rate = " + stb.str(whole_study_errors * 100.0f / whole_study_checked, 5));
vrb.bullet("#Overall pure switch error rate = " + stb.str(whole_study_true_errors * 100.0f / whole_study_checked, 5));
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple divisions by total / whole_study_checked that can be zero when a sample has no Checked variants (so HET is empty) or when no samples contribute any checked variants. This will produce a divide-by-zero at output time. Please add guards (e.g., emit zeros/NA for that sample and skip adding to whole-study totals; and only print overall rates if whole_study_checked > 0).

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +141
int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;
int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
whole_study_errors += n_all_switches;
whole_study_true_errors += n_pure_switches;
whole_study_checked += total;

fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total is computed as n_all_switches + n_flips + n_correct + n_consecutive_flips, which mixes per-variant switch outcomes (n_all_switches, n_correct) with event-based counts (n_flips, n_consecutive_flips). This inflates the denominator and makes the reported per-sample/overall switch error rates inconsistent with check()/writePerSample() (which use sum(Checked) as the denominator). Consider defining total_checked as HET.size() (or n_all_switches + n_correct) and using that for percentages and whole_study_checked, while keeping flip-event counts as separate columns.

Suggested change
int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;
int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
whole_study_errors += n_all_switches;
whole_study_true_errors += n_pure_switches;
whole_study_checked += total;
fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;
// Number of checked heterozygous variants for this sample
int total_checked = HET.size();
int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
whole_study_errors += n_all_switches;
whole_study_true_errors += n_pure_switches;
whole_study_checked += total_checked;
fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total_checked << " " << stb.str(n_all_switches * 100.0f / total_checked, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total_checked, 2) << " " << stb.str(n_pure_switches * 100.0f / total_checked, 4) << " " << stb.str(n_correct * 100.0f / total_checked, 2) << endl;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants