My research aims at developing precision molecular therapeutics that derive information from the patterns in “big data” of our physiological signals and of the underlying (epi)genomic markers (Think of the epigenome as the regulatory layer atop the genome). To accomplish this, I interrogate gene regulatory networks at different granularites and using different tools from my toolchest. These include machine learning, nanofabrication, and synthetic biology.
My systems-based integrated approach to decipher the language of both healthy and diseased cells, combines both external attributes (e.g., cell forces, physiology) and genomic features. This results in precise, predictive, and personalized interventions for health and longevity.
My approach is interdisciplinary in that it combines Computer Engineering in the form of leading edge, predictive machine learning (ML) algorithms to hack the “language of living cells”, unraveling why and how they malfunction (e.g., perturbed regulatory networks), and Biological Engineering and Nanoscale Fabrication to validate my hypotheses using engineered nanoscale sensors for perceiving the changes in living cells.
Data Engineering Aspect: In my domain, a single accuracy metric affords an incomplete description of many real-world situations. Other important outcomes, both in healthcare and in digital agriculture, are privacy, interpretability, and scalability. Thus, I also collaborate with experts in distributed computing and privacy, to engineer frameworks to fertilize the entire ecosystem of creating efficient pipelines for development, deployment, and testing of genomics algorithms. This involves significant data engineering using technologies such as noSQL databases (e.g., Cassandra, MongoDB), HPC frameworks (e.g., Hadoop, Apache Spark and Kafka), and high-level programming language abstractions and libraries. A vast swath of the algorithms and pipelines that we are building will hack the regulatory codes of living cells, decipher the patterns and encodings latent in high-dimensional epigenomic data spaces, and then speed up the ``learning'' process through distributed implementations.
I am also starting to apply some of these data engineering principles to problems in digital agriculture and sustainability by collaborating with domain-scientists at Purdue.