Somali Chaterji's Cybernook

Assistant Professor
Department of Agricultural and Biological Engineering
Purdue University

[How to say my name: SHOH-MAH-LEE CHA-ter-JEE]

Cells and Machines Innovatory logo

Cells and Machines Innovatory

My laboratory, Cells and Machines Innovatory, works at the interface of biological engineering, genomics, and machine learning, solving ground-breaking problems in precision medicine and digital agriculture. Our broader goals are connected by the theme of ML for the greater good. I believe that by leveraging the power of data using finely crafted learning machines, often adapted for my domains of choice, we can spread health and vitality. For precision medicine, my goal is to understand the software embedded in living cells by writing some of the finest code. Life is computation, in fact, the universe is computation. So, inspired by this, I also want to design algorithms and build cyberinfrastructure for next-generation healthcare and agriculture.
Best paper award at ACM BCB '15 (Sep '15; Atlanta, GA)
At Indiana U. for my talk in CS (Oct '17; Bloomington, IN)
NIH R01 project kickoff at Argonne National Lab (Jul '16; Argonne, IL)
With the Seed of Success acorn award given to Purdue investigators winning > $1M grants (Oct '16; W Lafayette, IN)
My first half marathon (Oct '15; W Lafayette, IN)
Finishing the Purdue half marathon (Oct '16; W Lafayette, IN)


Somali Chaterji is an Assistant Professor in the Department of Agricultural and Biological Engineering at Purdue University, where she specializes in developing algorithms and statistical models for biological engineering, precision medicine, and digital agriculture.

Dr. Chaterji got her PhD in Biomedical Engineering from Purdue University, winning the Chorafas International Award (2010), College of Engineering Best Dissertation Award (2010), and the Future Faculty Fellowship Award (2009). She did her Post-doctoral Fellowship at the University of Texas at Austin in the Department of Biomedical Engineering, where her work was supported by an American Heart Association award. She also received the Seed-for-Success Award from Purdue University's Excellence in Research Awards program in recognition of her NIH R01 award in 2016.  Dr. Chaterji is a technology commercialization enthusiast and has been consulting for the IC2 Institute at the University of Texas at Austin, since Spring 2014.


                Google Hangout: somalichaterji; Skype: schaterji

                Twitter: @SomaliChaterji

Somali Chaterji's Extended BioExtended Bio (for talks and other engagements)

Research Interests

My current areas of interest and expertise are:

  1. Computational Genomics
  2. Optimization of the Architectures of Nanoscale Sensors
  3. Distributed Infrastructures for Digital Agriculture
  4. Machine Learning for Precision Medicine
  5. Federated Infrastructures for Precision Medicine

Here is a succinct vision statement for my research. [ WWW ]

And here is me presenting a summary of the research activities at Cells and Machines Innovatory. [ YouTube ]

From my post-doctoral days, I honed my expertise in designing and annotating high-throughput and diverse genomics experiments, developing and testing nanosensor architectures for cell-based therapeutics, and data integration from diverse omics datasets. In my PhD work, I developed testing frameworks for optimizing the design and development of cardiovascular devices. I have collaborations with both wet-lab experimentalists (UT Austin, U of Washington, UNC Medical School) and computer engineers (Purdue, U of Maryland, U of Chicago).

News  breaking news

  1. July 2019: 1 paper accepted at BMC Bioinformatics (on GPU acceleration of CNN-based ''learning'' of regulatory elements in the genome) and 2 at Nature Scientific Reports (on natural language processing for deciphering the genome and using de Bruijn Graphs for faster and more accurate genome assembly using multiple k-mers), stay tuned for manuscripts.

  2. July 2019: Cells and Machines goes to ASABE AIM 2019 in Boston, presenting two papers on applications of data science and data engineering in digital Ag. 

  3. July 2019: Our paper on online tuning of noSQL databases such as Cassandra and Redis presented at USENIX-ATC 2019 in Renton, WA. [ Link ]

  4. June 2019: IFIP Working Group talk on Database tuning for distributed clusters for precision medicine and digital agriculture applications. [ Link ]

  5. May 2019: Three vision papers (teasers) along the lines of the directions of research in my lab online in IEEExplore: machines,  ML and healthcare, & distributed ledgers (blockchain).

  6. March 2019: New funding from WHIN (Wabash Heartland Innovation Network) and from Purdue's 150 giant leaps vision paper call to my lab, both on digital agriculture innovation and pipelines: [ Tweet 1 ] [ Tweet 2 ]

  7. November 2018: The panel topics for Comsnets that I am chairing are: Man versus Machine: Humans Need Not Apply; AI could solve the world's healthcare problems and that too at scale! Blockchain can be the backbone of India's Economy. [ WWW ]

  8. October 2018: PhD student, Ashraf Mahgoub, presented our poster at OSDI in Carlsbad, CA. OSDI is the premier computer systems conference. The poster is part of our ongoing NIH project and in it we show how to configure a distributed NoSQL database to run on heterogeneous cloud computing instances. [ Poster ]

  9. April 2018: Our paper on condition-specific, microRNA target prediction using distributed SVM, on top of Apache Hadoop and Spark (finally) appears in IEEE TCBB. [ Paper ]

  10. March 2018: I am serving on the Organizing Committee of the 11th IEEE Comsnets 2019, as the panel co-chair. Consider submitting to the conference to be held in Bangalore, India in January and attending the panels. The conference brings together academics, entrepreneurs, and policy makers in applied computer science fields from all over the world.

News Archive


Here are the recent publications. For the complete list, please visit my Google Scholar page.

All the above publications are accompanied by open-source software packages freely available to the research community, with visualization modules where applicable.

  1. Mahgoub, A., Wood, P., Medoff, M., Mitra, S., Meyer, F., Chaterji, S., Bagchi, S., ''Sophia: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads,'' Accepted to appear at the USENIX Annual Technical Conference USENIX ATC 2019, pp. 1--13, July 10--12, 2019.

  2. Kumar, R., Buckmaster, D., and Chaterji, S., "Interpretable Modeling for Yield Trends of Corn and Soybean in the Midwest," Accepted to appear at the 2019 Annual Meeting of the American Society of Agricultural and Biological Engineers, pp. 1--6, July 7--10, 2019.

  3. Mahgoub, A. and Chaterji, S., "Iris: Tuning the Configuration Parameters of NoSQL Databases for High-Throughput Digital Agricultural Processing Pipelines," 2019 Annual Meeting of the American Society of Agricultural and Biological Engineers, pp. 1--6, July 7--10, 2019.

  4. Chaterji, S., ''Panel 1 Position Paper: Man versus Machines---Humans Need Not Apply,'' Proceedings of the 11th International Conference on COMmunication Systems & NETworkS, IEEE-COMSNETS, pp. 1--3, 2019.

  5. Chaterji, S., ''Panel 2 Position Paper: AI could Solve the World's Healthcare Problems and that too at Scale!,'' Proceedings of the 11th International Conference on COMmunication Systems & NETworkS, IEEE-COMSNETS, pp. 1--4, 2019.

  6. Nagaraj, M. and Chaterji, S., ''Panel 3 Position Paper: Blockchain can be the Backbone of India's Economy,'' Proceedings of the 11th International Conference on COMmunication Systems & NETworkS, IEEE-COMSNETS, pp. 1--4, 2019.

  7. Ghoshal, A., Zhang, J., Roth, M., Xia, K. M., Grama, A., and Chaterji, S., "A Distributed Classifier for MicroRNA Target Prediction with Validation through TCGA Expression Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), pp. 1-16, April 2018 (DOI: 10.1109/TCBB.2018.2828305). [ Paper ]

  8. Koo J., Zhang J., and Chaterji, S., "Tiresias: Deep Learning Approach to Decipher MicroRNA Regulatory Networks," Theranostics, vol. 8, issue 1, pp. 277-291, 2018. [ Paper

  9. Le, V., Lee, J., Chaterji, S., Spencer, A., Liu, Y-L., Kim, P., Yeh, H-C., Kim, D-H., Baker, A. B., "Syndecan-1 in mechanosensing of nanotopological cues in engineered materials," Biomaterials, volume 155, pp. 13-24, February 2018. [ Paper ]

  10. Thomas, T. E., Koo J., Chaterji, S., and Bagchi, S., "Minerva: A reinforcement learning-based technique for optimal scheduling and bottleneck detection in distributed factory operations," At the 10th IEEE Conference on Communication Systems & Networks (COMSNETS), pp. 1-8, Jan 3-7, 2018, Bangalore, India. (Acceptance rate: 38/125 (30.4%)) [ Paper ]

  11. Mahgoub, A., Wood, P., Ganesh, S., Mitra, S. (Adobe Research), Gerlach, W. (Argonne National Laboratory), Harrison T. (Argonne National Laboratory), Meyer, F., (Argonne National Laboratory), Grama, A., Bagchi, S., and Chaterji, S., “Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads,” At the ACM/IFIP/USENIX Middleware Conference, pp. 1-13, Dec 11-15, 2017, Las Vegas, Nevada. (Acceptance rate: 20/85 = 23.5%) [ Paper ]

  12. Chaterji, S., Ahn E. H., and Kim D-H., “CRISPR Genome Engineering for Human Pluripotent Stem Cell Research,” Theranostics, vol. 7, pp. 1–25, 2017. [ Paper ]

  13. Theera-Ampornpunt, N. and Chaterji, S., Prediction of Enhancer RNA Activity Levels from ChIP-seq-derived Histone Modification Combinatorial Codes, IEEE Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics (DLB2H 2017), pp. 1–9, Nov 13-16, 2017. [ Paper ]

  14. Kannan, S., Wood, P., Chaterji S., and Bagchi, S., “TopHat: Topology-based Host-Level Attribution for Multi-Stage Attacks in Enterprise Systems using Software Defined Networks,” At the 13th EAI International Conference on Security and Privacy in Communication Networks (SecureComm), pp. 1–22, Oct 23-25, 2017. (Acceptance rate: 31/105 = 29.5% (full papers)) [ Paper ]

  15. Meyer F., Bagchi S., Chaterji S., Gerlach W., Grama A., Harrison T., Paczian T., Trimble W., Wilke A., “MG-RAST Version 4 — Lessons learned from a decade of low-budget ultra-high throughput metagenome analysis,” Oxford Briefings in Bioinformatics, bbx105, pp. 1-9, September 2017. [ Paper ]

  16. Chaterji S., Koo J., Li N., Meyer F., Grama A., and Bagchi S., “Federation in Genomics Pipelines: Techniques and Challenges,” Oxford Briefings in Bioinformatics, bbx102, pp. 1-10, August 2017. [ Paper ]

  17. Mahadik K., Wright C., Kulkarni M., Bagchi S., and Chaterji S., “Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers,” At the 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), pp. 1-7, Aug 20-23, 2017, Boston, MA. [ Paper ]

  18. Kim SG, Harwani M, Grama A, and Chaterji S, “EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm,” In Nature Scientific Reports, pp. 1-21, vol. 6, 2016. [ Paper ]

  19. Kim SG, Ampornpunt N, Fang C-H, Harwani M, Grama A, and Chaterji S. Opening up the blackbox: An interpretable deep neural network-based classifier for cell-type specific enhancer predictions. In the BMC Systems Biology journal, pp. 1-26, 10(2), article 54, 2016. [ Paper ]
  20. Mahadik K, Wright C, Zhang J, Kulkarni M, Bagchi S, and Chaterji S. SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications. At the International Conference on Supercomputing (ICS), pp. 1-13, June 1-3, 2016, Istanbul, Turkey. (Acceptance rate: 32/183 = 17.5%) [ Paper ]

  21. Ampornpunt N, Kim SG, Ghoshal A, Bagchi S, Grama A, and Chaterji S. Computational and network cost of training distributed Support Vector Machines for large genomics data. At the 8th International IEEE Conference on Communication Systems and Networks (COMSNETS), pp. 1-8, January 5-9, Bangalore, India. (Acceptance rate: 39/143 = 27.3%)[ Paper ]

  22. Ghoshal A, Grama A, Bagchi S, and Chaterji S. An Ensemble SVM Model for the Accurate Prediction of Non-Canonical MicroRNA Targets. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics (BCB), pp. 403-412, September 9-12, 2015, Atlanta, GA. (Acceptance rate: 48/141 = 34%) (Best Paper) [ Paper ] [ Presentation ] [ Recording of the presentation (in wav) ]

  23. Kim S, Ampornpunt N, Grama A, and Chaterji S. Interpretable Deep Neural Networks for Enhancer Prediction. At the 9th IEEE International Conference on Bioinformatics and Biomedicin (BIBM), pp. 242-249, Nov 9-12, 2015. (Acceptance rate: 68/346 = 19.7%) [ Paper ]

  24. Ghoshal A, Shankar R, Bagchi S, Grama A, and Chaterji S. MicroRNA target prediction using thermodynamic and sequence curves. BMC Genomics, vol. 16, no. 1, November 2015. [ Paper ]

  25. Purdue University, Orion: A fine grained parallel implementation for genomic search, At: (January 2015).

  26. Mahadik K, Chaterji S, Zhou, B, Kulkarni, M, and Bagchi, S. Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing), pp. 449-460, Nov 16-21, 2014. (Acceptance rate: 82/394 = 20.8%) [ Paper ]

  27. Chaterji S, Kim P, Choe SH, Ho DS, Tsui JH, Baker AB, Kim DH. Synergistic Effects of Matrix Nanotopography and Stiffness on Vascular Smooth Muscle Cell Function. Tissue Engineering Part A, 2014, 20(15-16): 2115-2126. [ Paper ]

  28. Vedantham K§, Chaterji S§, Kim SW, Park K. Development of a probucol‐releasing antithrombogenic drug eluting stent. Journal of Biomedical Materials Research Part B: Applied Biomaterials 100, no. 4 (2012): 1068-1077. (§ = Equal contribution) [ Paper ]


I am fortunate to work with the following brilliant graduate students:

The following are alums from our group:

And the following stellar undergraduate students work with us:



  1. Do you want to design condition-specific, precise, nucleotide-based therapeutics? How to predict the targetome of regulatory non-coding RNA with non-canonical targets?
  2. How do long non-coding RNAs act as enhancers of gene expression?
  3. Can we cluster the non-coding regulatory genome based on their underlying combinatorial signatures?
  4. How to build parallel algorithms for genomics applications?
  5. Do you want to collaboratively build libraries and a DSL for genomics?

Examples of questions that we are looking to answer are:

  1. How do you predict the function and targets of these non-coding regions using cutting-edge machine learning techniques?
  2. How to use graph-based methods to extend gene regulatory networks?
  3. How do you become a part of the $1000 genome? Is it for real ( Can you be a part of this Analysis Revolution, analyzing omics data?
  4. Can you precisely edit out the erroneous parts of genomes?

If all of this excites you and you are looking to be a part of this exciting “omics” revolution, come join us!


Last updated: July 14, 2019