-
Notifications
You must be signed in to change notification settings - Fork 355
Description
Thanks for developing fastp!
I wanted to implement deduplication in my pipeline. According to my tests, the report shows: duplicates being reported, but the number of reads passing the filter remains the same whether deduplication is applied or not, However, the output files show fewer reads when deduplication is applied - which makes sense but doesn't agree with the report. Could you help me clarify why this discrepancy occurs?
Commands:
fastp
-i "$r1"
-I "$r2"
-o "$outR1"
-O "$outR2"
--detect_adapter_for_pe
--length_required 50
--qualified_quality_phred 20
--unqualified_percent_limit 49
--low_complexity_filter
--complexity_threshold 30
--report_title "${sample}_fastp_report"
--thread $SLURM_CPUS_PER_TASK
-j "$json_report"
-h "$html_report"
fastp
-i "$r1"
-I "$r2"
-o "$outR1"
-O "$outR2"
--detect_adapter_for_pe
--length_required 50
--qualified_quality_phred 20
--unqualified_percent_limit 49
--low_complexity_filter
--complexity_threshold 30
--report_title "${sample}_fastp_report"
--thread $SLURM_CPUS_PER_TASK
-j "$json_report"
-h "$html_report"
--dedup
Reports:
30 765 342 / 2 = 15 382 671
Output files:
001_Qfiltering/S_12/S_12_FASTP_R1_001.fastq.gz: 15382671
001_Qfiltering/S_12/S_12_FASTP_R2_001.fastq.gz: 15382671
001_QfilteringDedup/S_12/S_12_FASTP_R1_001.fastq.gz: 13396876
001_QfilteringDedup/S_12/S_12_FASTP_R2_001.fastq.gz: 13396876