currently, the AptaNet notebook uses randomly generated strings, i.e., dummy data.
We should improve the notebook so it shows how to use real data for training and inference.
As discussed, we should change the existing sections in the AptaNet notebook, to use as data the AptaTrans dataset, which is str x str -> binding 0/1.
We should also add two new sections:
- where we train on the entire AptaTrans dataset, and predict binding probability (
predict_proba) between a new pdb file (protein) and a DNA sequence
- here, simply use any
pdb file
- same for using DNA sequences from a fasta file
- where we use MCTS combined with a trained AptaNet to propose new aptamers for a new pdb file, i.e., a form of in-silico Selex