-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi,
Thanks for this fast tool! I am looking at the results between seqpare and igd search -s function. Based on the documentation, I suppose these two should give identical results. But I tried it on these two functions and the results are different. seqpare looks like 3 times higher similarity than igd.
Is it a bug in seqpare or igd?
seqpare
seqpare 'index/*.bed.gz' 'bedfile.bed.gz' -m 1 -o 'result.txt' gives:
N1 N2 teo tc sm
1424 6464 31.39503670 126 0.00399601 bed1.bed.gz
10370 6464 257.09466553 885 0.01550921 bed2.bed.gz
12568 6464 321.13378906 976 0.01716296 bed3.bed.gz
7436 6464 500.28283691 2206 0.03733533 bed4.bed.gz
1879 6464 145.14971924 431 0.01770583 bed5.bed.gz
20677 6464 2170.07275391 3587 0.08690397 bed6.bed.gz
igd search -s
igd search index.igd -q bedfile.bed.gz -s gives:
index number of regions similarity dataset name
0 1424 0.001194 bed1.bed.gz
1 10370 0.004325 bed2.bed.gz
2 12568 0.004321 bed3.bed.gz
3 7436 0.010286 bed4.bed.gz
4 1879 0.004843 bed5.bed.gz
5 20677 0.022865 bed6.bed.gz
igd search
igd search index.igd -q bedfile.bed.gz gives:
index number of regions number of hits File_name
0 1424 126 bed1.bed.gz
1 10370 885 bed2.bed.gz
2 12568 976 bed3.bed.gz
3 7436 2206 bed4.bed.gz
4 1879 431 bed5.bed.gz
5 20677 3587 bed6.bed.gz
Where we can see that the "tc" columns give the same results, but "sm" columns give different results.
BTW, is that possible that make igd support the "teo" column output as well?
Also, I found that igd's '-o' function does not work now. If I use pipeline ">", the "Total counts: XXXX" will also be export to the file which is not that easy to use in other downstream processes. "head -n -1" can be used for this situation, but still not that intuitive. Maybe change the final line to stderr is better?
Best,
Xin