Skip to content

igd/seqpare result discrepancies #26

@xindongnov

Description

@xindongnov

Hi,

Thanks for this fast tool! I am looking at the results between seqpare and igd search -s function. Based on the documentation, I suppose these two should give identical results. But I tried it on these two functions and the results are different. seqpare looks like 3 times higher similarity than igd.

Is it a bug in seqpare or igd?

seqpare
seqpare 'index/*.bed.gz' 'bedfile.bed.gz' -m 1 -o 'result.txt' gives:

N1	N2	teo	tc	sm
1424	6464	 31.39503670	126	  0.00399601	bed1.bed.gz
10370	6464	257.09466553	885	  0.01550921	bed2.bed.gz
12568	6464	321.13378906	976	  0.01716296	bed3.bed.gz
7436	6464	500.28283691	2206	  0.03733533	bed4.bed.gz
1879	6464	145.14971924	431	  0.01770583	bed5.bed.gz
20677	6464	2170.07275391	3587	  0.08690397	bed6.bed.gz

igd search -s
igd search index.igd -q bedfile.bed.gz -s gives:

index	 number of regions	 similarity	 dataset name
0	1424	  0.001194	bed1.bed.gz
1	10370	  0.004325	bed2.bed.gz
2	12568	  0.004321	bed3.bed.gz
3	7436	  0.010286	bed4.bed.gz
4	1879	  0.004843	bed5.bed.gz
5	20677	  0.022865	bed6.bed.gz

igd search
igd search index.igd -q bedfile.bed.gz gives:

index	 number of regions	 number of hits	 File_name
0	1424	126	bed1.bed.gz
1	10370	885	bed2.bed.gz
2	12568	976	bed3.bed.gz
3	7436	2206	bed4.bed.gz
4	1879	431	bed5.bed.gz
5	20677	3587	bed6.bed.gz

Where we can see that the "tc" columns give the same results, but "sm" columns give different results.

BTW, is that possible that make igd support the "teo" column output as well?

Also, I found that igd's '-o' function does not work now. If I use pipeline ">", the "Total counts: XXXX" will also be export to the file which is not that easy to use in other downstream processes. "head -n -1" can be used for this situation, but still not that intuitive. Maybe change the final line to stderr is better?

Best,
Xin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions