ReverTra is a practical tool designed for mapping protein sequences (amino-acid) to species-optimized codon sequences. It employs models developed by Tomer Sidi, Tamir Tuller, and Rachel Kolody, utilizing deep learning and transformers architecture trained on mRNA sequences and alignments of 4 species: S. cerevisiae, S. pombe, E. coli, and B. subtilis. For detailed insights into the models, please refer to our paper [ref] and explore the project code here. In the project code you can also find working notebooks for model inference and data exploration.
This application is designed to showcase the evaluation techniques utilized for assessing the accuracy and perplexity of the models highlighted in the paper. As outlined in the research, the sequences of the four species were meticulously divided into training, validation, and test groups, ensuring no homologs between them. Users can now conduct model inference on sequences from the test set and explore their individual performance. Additionally, users have the capability to download both the sequences and performance metrics for further investigation.
(1) Inference type - Mask/Mimic; the two inference type are presented at the paper. In mask mode the input to the model is the AA sequence of the target protein. In mimic mode, an additional codon sequence aligned to the target AA sequence is provided to the model.
(2) Model window size - 10/30/50/75/100/150; In the paper, we present different model trained using different window sizes. Each option in this category activates a different model for prediction.