Interpreting the output

Hello,

Are you able to give some guidance on how to interpret the output?
For example:
```
INFO:splitStrain.py has started.
INFO:sample name: SAMEA1100847.ERR2509676.recal.bam
INFO:reference name: Chromosome, reference length: 4411532
INFO:regionStart: 100, regionEnd: 4000000
INFO:depth threshold percent: 75
INFO:entropy threshold: 0.0
INFO:using gff: tuberculosis.filtered-intervals.gff
INFO:Likelihood Ratio Statistic: -2*log(LR) = 12495, treshold: 1920
INFO:using the model:GMM
file    alpha   min_LR_thresh   LR_statistic    log-p-value     p-value proportions
SAMEA1100847.ERR2509676.recal.bam       0.05    1920    12495   -14.367 0.000   0.83 0.17
```

How should I interpret this? I note the p-value is 0, does this mean that multiple strains are detected confidently? 

In the manuscript [10.1099/mgen.0.000607](https://doi.org/10.1099%2Fmgen.0.000607) it is mentioned that the ROC curvers are generated  using the likelihood ratio. Is that equivalent to the `LR_statistic` above? Is there a recommend threshold for the `LR_statistic` to discriminate between pure and mixed infections?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreting the output #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Interpreting the output #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions