Fix switch/flip error rate calculations by JosephLalli · Pull Request #131 · odelaneau/shapeit5

JosephLalli · 2026-03-08T04:54:51Z

The previous code counted a string of errors "0 1 1 1 0" as two flips, two switch errors, and one correct variant. For researchers who want to examine the relative rate of flip events and switch errors unrelated to flips, this is unhelpful.

Instead, I would argue this represents one flip event, three switch errors, two correct variants, and zero "pure switch errors" (ie isolated switch errors that are not associated with a flip error).

This code change calculates flip events, switch errors, pure switch errors, and correct variants. It also calculated how many flip errors occur consecutively (ie, more than two errors in sequence).

Re-done post rebase.

The previous code counted a string of errors "0 1 1 1 0" as two flips, two switch errors, and one correct variant. For researchers who want to examine the relative rate of flip events and switch errors unrelated to flips, this is unhelpful. Instead, I would argue this represents one flip event, three switch errors, two correct variants, and zero "pure switch errors" (ie isolated switch errors that are not associated with a flip error). This code change calculates flip events, switch errors, pure switch errors, and correct variants. It also calculated how many flip errors occur consecutively (ie, more than two errors in sequence).

Copilot

Pull request overview

Updates haplotype_checker’s per-sample reporting to distinguish flip events from switch errors (including “pure” switch errors) to better support downstream analysis of phasing error patterns.

Changes:

Reworks writeFlipSwitchErrorPerSample() to compute flip events, total switch errors, pure switch errors, and correct variants.
Adds tracking for “consecutive flip” runs and prints new overall summary rates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T04:57:56Z

+		int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;
+		int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
+		whole_study_errors += n_all_switches;
+		whole_study_true_errors += n_pure_switches;
+		whole_study_checked += total;
+
+		fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;
 	}
 	fdo.close();
+	vrb.bullet("#Overall switch error rate = " + stb.str(whole_study_errors * 100.0f / whole_study_checked, 5));
+	vrb.bullet("#Overall pure switch error rate = " + stb.str(whole_study_true_errors * 100.0f / whole_study_checked, 5));


There are multiple divisions by total / whole_study_checked that can be zero when a sample has no Checked variants (so HET is empty) or when no samples contribute any checked variants. This will produce a divide-by-zero at output time. Please add guards (e.g., emit zeros/NA for that sample and skip adding to whole-study totals; and only print overall rates if whole_study_checked > 0).

Copilot · 2026-03-08T04:57:56Z

+		int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;
+		int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);
+		whole_study_errors += n_all_switches;
+		whole_study_true_errors += n_pure_switches;
+		whole_study_checked += total;
+
+		fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;


total is computed as n_all_switches + n_flips + n_correct + n_consecutive_flips, which mixes per-variant switch outcomes (n_all_switches, n_correct) with event-based counts (n_flips, n_consecutive_flips). This inflates the denominator and makes the reported per-sample/overall switch error rates inconsistent with check()/writePerSample() (which use sum(Checked) as the denominator). Consider defining total_checked as HET.size() (or n_all_switches + n_correct) and using that for percentages and whole_study_checked, while keeping flip-event counts as separate columns.

Suggested change

int total = n_all_switches + n_flips + n_correct + n_consecutive_flips;

int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);

whole_study_errors += n_all_switches;

whole_study_true_errors += n_pure_switches;

whole_study_checked += total;

fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total << " " << stb.str(n_all_switches * 100.0f / total, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total, 2) << " " << stb.str(n_pure_switches * 100.0f / total, 4) << " " << stb.str(n_correct * 100.0f / total, 2) << endl;

// Number of checked heterozygous variants for this sample

int total_checked = HET.size();

int n_pure_switches = n_all_switches - (n_flips*2 + n_consecutive_flips);

whole_study_errors += n_all_switches;

whole_study_true_errors += n_pure_switches;

whole_study_checked += total_checked;

fdo << H.vecSamples[H.IDXesti[i]] << " " << n_all_switches << " " << n_flips << " " << n_consecutive_flips << " " << n_pure_switches << " " << n_correct << " " << total_checked << " " << stb.str(n_all_switches * 100.0f / total_checked, 2) << " " << stb.str((n_flips*2 + n_consecutive_flips) * 100.0f / total_checked, 2) << " " << stb.str(n_pure_switches * 100.0f / total_checked, 4) << " " << stb.str(n_correct * 100.0f / total_checked, 2) << endl;

Copilot AI review requested due to automatic review settings March 8, 2026 04:54

Copilot started reviewing on behalf of JosephLalli March 8, 2026 04:55 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix switch/flip error rate calculations#131

Fix switch/flip error rate calculations#131
JosephLalli wants to merge 1 commit intoodelaneau:mainfrom
JosephLalli:fix-flip-switch-tracking

JosephLalli commented Mar 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JosephLalli commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JosephLalli commented Mar 8, 2026 •

edited

Loading