Skip to content

Interpreting the output #14

@jemunro

Description

@jemunro

Hello,

Are you able to give some guidance on how to interpret the output?
For example:

INFO:splitStrain.py has started.
INFO:sample name: SAMEA1100847.ERR2509676.recal.bam
INFO:reference name: Chromosome, reference length: 4411532
INFO:regionStart: 100, regionEnd: 4000000
INFO:depth threshold percent: 75
INFO:entropy threshold: 0.0
INFO:using gff: tuberculosis.filtered-intervals.gff
INFO:Likelihood Ratio Statistic: -2*log(LR) = 12495, treshold: 1920
INFO:using the model:GMM
file    alpha   min_LR_thresh   LR_statistic    log-p-value     p-value proportions
SAMEA1100847.ERR2509676.recal.bam       0.05    1920    12495   -14.367 0.000   0.83 0.17

How should I interpret this? I note the p-value is 0, does this mean that multiple strains are detected confidently?

In the manuscript 10.1099/mgen.0.000607 it is mentioned that the ROC curvers are generated using the likelihood ratio. Is that equivalent to the LR_statistic above? Is there a recommend threshold for the LR_statistic to discriminate between pure and mixed infections?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions