Skip to content

Deduplication not reported in the html report #638

@fconstancias

Description

@fconstancias

Thanks for developing fastp!

I wanted to implement deduplication in my pipeline. According to my tests, the report shows: duplicates being reported, but the number of reads passing the filter remains the same whether deduplication is applied or not, However, the output files show fewer reads when deduplication is applied - which makes sense but doesn't agree with the report. Could you help me clarify why this discrepancy occurs?

Commands:
fastp
-i "$r1"
-I "$r2"
-o "$outR1"
-O "$outR2"
--detect_adapter_for_pe
--length_required 50
--qualified_quality_phred 20
--unqualified_percent_limit 49
--low_complexity_filter
--complexity_threshold 30
--report_title "${sample}_fastp_report"
--thread $SLURM_CPUS_PER_TASK
-j "$json_report"
-h "$html_report"

fastp
-i "$r1"
-I "$r2"
-o "$outR1"
-O "$outR2"
--detect_adapter_for_pe
--length_required 50
--qualified_quality_phred 20
--unqualified_percent_limit 49
--low_complexity_filter
--complexity_threshold 30
--report_title "${sample}_fastp_report"
--thread $SLURM_CPUS_PER_TASK
-j "$json_report"
-h "$html_report"
--dedup

Reports:

Image Image

30 765 342 / 2 = 15 382 671

Output files:
001_Qfiltering/S_12/S_12_FASTP_R1_001.fastq.gz: 15382671
001_Qfiltering/S_12/S_12_FASTP_R2_001.fastq.gz: 15382671
001_QfilteringDedup/S_12/S_12_FASTP_R1_001.fastq.gz: 13396876
001_QfilteringDedup/S_12/S_12_FASTP_R2_001.fastq.gz: 13396876

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions