-
Notifications
You must be signed in to change notification settings - Fork 389
Description
Hi!
Thank you for introducing Evo 2, it's truly fascinating how you managed to up-scale your model to genome-scale generation!
My issue is fundamentally just a question, and it is related to the generation of CRISPR-Cas loci. In the previous version of Evo, Evo 1, special tokens were assigned to 3 different classes of Cas (Cas9, Cas12 and Cas13). I was wondering if this feature is somehow maintained in Evo 2, since I was wondering if Evo 2 would generate better or more diverse results as it has been trained in a bigger prokaryote dataset size.
If the generation mode is not the same, what would be your advice to generate a specific subtype of CRISPR-Cas locus? Would it be to provide the corresponding species special token/phylogenetic tag (taking an example from the paper: |D__BACTERIA;P__PSEUDOMONADOTA;C__GAMMAPROTEOBACTERIA; O__ENTEROBACTERALES;F__ENTEROBACTERIACEAE;G__ESCHERICHIA; S__ESCHERICHIA|) and a context upstream sequence to prompt the model?
Thanks a lot in advance!