Short description of the project(s):
- Our lab’s Biological Engineering thrust centers around applied machine learning for pattern matching in the genome and data engineering infrastructures for large-scale genomics analysis. For the latter, we have an ongoing National Institutes of Health (NIH) grant with Argonne National Laboratory on hardening the metagenomics pipeline---MG-RAST. Essentially, it is a project on database engineering where we increase the throughput of the MG-RAST pipeline, which is the best-in-class metagenomics analysis pipeline in the world.
- On the applied machine learning front, we have applied deep neural networks for finding transcriptional regulatory regions such as enhancers. These regulate the kind (upregulation or downegulation) and the level of gene expression, which then goes on to influence our health. Further, we are also interested in regulatory RNA such as microRNA (miRNA) that form networks with genes to regulate their activity. miRNA form many-to-many networks with genes and often work in groups to regulate groups of genes in a context-specific manner.
- Finally, we are also working on cutting edge techniques for error correction in genomes---both in short reads and in long reads, such as PacBio and Nanopore reads. For the latter, we are using Natural Language Processing (NLP)-based techniques to capture the spatial complexity of the genomes and use NLP-derived metrics toward data-driven techniques to find the best configuration parameters for error correction.
- Read more here: https://schaterji.io/projects