Speaker
Description
Sequence data, such as nucleotides or amino acids, is essential for understanding biology. However, analyzing sequencing data and genotype-phenotype associations is challenging due to noise, nonlinear relationships, collinearity, and high dimensionality. While machine learning (ML) effectively detects patterns in this data, user-friendly tools remain limited. To address this, we developed deepBreaks, an open-source tool that identifies key genomic positions linked to phenotypic traits by comparing multiple ML models. It is available at https://github.com/omicsEye/deepBreaks.
We also leverage language models to analyze DNA as the oldest language written through the chemistry of life, uncovering hidden patterns in the genome. Our models incorporate advanced architectures like disentangled attention with positional encoding, improving feature extraction, particularly under limited training conditions. By applying domain-specific pre-training strategies, we demonstrate that training on relevant data significantly enhances both accuracy and generalizability. This presentation highlights practical applications of these innovations, including microbial species profiling and SARS-CoV-2 genomic dynamics, showcasing their versatility in biological analysis.
| Expedited Notification | Yes, I want to opt-in for Expedited Notification |
|---|