Research Interests
Here is the vision of my lab, which straddles digital agriculture and computational biology using applied data analytics:
ICAN's vision at the interface of living cells and artificial cells (aka. sensor nodes or the things of IoT)
ICAN's vision at the interface of living cells and artificial cells (aka. sensor nodes or the things of IoT) I lead the “Innovatory of Cells and Neural Machines” [ICAN] at Purdue University. ICAN is an applied data science and engineering lab working on two thrusts: computational genomics and IoT.
These two thrusts are connected both by the tools and data engineering platforms that we develop to enable precision health and IoT algorithms and by my vision of mapping the energy efficiency of living cells to the artificial sensor nodes that are deployed ubiquitously in farms and factories. My lab is driven to learn and inter-change tools between these two thrusts with the goal to create new algorithms to understand the genome of living cells, on the one hand, and to enable IoT sensors to perceive and actuate on the other. I am excited to have a lab that translates lab-scale algorithms to practice in that we have been developing testbeds to test the energy-aware variants of our algorithms to run them on the microcontrollers and jetsons of the world so as to enable adaptive algorithms to automatically interface with the hardware and with the context, whether it be the context of the video (dynamic versus more static video frames) or the context of the disease (where we develop local models for the different conditions). I have a stellar team of undergrads and graduate students working with me, who are developing expertise in both developing whitebox algorithms and in the ability to federate or distribute the algorithms across devices or servers. I love working with both practitioners to test my algorithms “in the wild”.
My areas of interest and expertise
The primary focus of my research is applied data analytics and data engineering directed at the domains of digital agriculture and precision health.
On the IoT and digital agriculture front, our focus is on the data engineering side especially designing algorithms for lightweight in-sensor analytics, partitioning algorithms across different platforms of computation (sensor → edge → cloud), and designing robust backend databases for high-performance data lakes for IoT in digital agriculture.
On the precision health and genome engineering front, our focus is on the computational genomics and synthetic biology sub-domains.
Research Updates
Here I will provide new dimensions and storylines from the work that we do at ICAN. In general, I will write this so that it is accessible to a general technical audience.
Summer 2020: Work inspired by funding from the Lilly Endowment and Army Research Laboratories [ARL].
My lab [https://schaterji.io] called Innovatory for Cells and Machines, abbreviated ICAN, got started in 2018 and since then we have been branching out in two distinct, albeit inter-related technical areas, first: Internet-of-Things (IoT) and edge computing for digital agriculture and other related IoT application domains, and second: genome engineering. They are related in the sense that my lab develops machine learning-based algorithms for efficient data analytics on the IoT side and for decoding the genome on the genome engineering side. Let me describe our novel approach in these two areas.
IoT and Edge Computing for Digital Agriculture: We are developing algorithms for handling large volumes of IoT data and to derive actionable insights from them. This is relevant in digital agriculture with data being generated by ground sensors and aerial drones. Our novel approach allows the data to be stored and processed close to the source of the data rather than being ferried all the way to a data center. This is important to ensure the privacy of the IoT data (e.g., it can be made to stay completely within the premises) and to ensure that the scarce wireless bandwidth and constrained energy resources are not spent carrying raw and irrelevant data to the backend. As part of our solution approach, we have developed:
- On-device computation: This is the approach where machine learning algorithms, such as, object recognition or activity recognition based on video data, can be approximated to run on small form factor devices.
- Federated machine learning: This is the approach where all data does not need to be centralized for building machine learning models — different data owners may have privacy misgivings about doing that. Rather, the model can be built through coordination among many local models with data staying private to each data owner. Our approach allows this to be done in a reliable and secure manner, tolerating some of the local clients failing or acting maliciously.
- Distributed databases: The data needs to be stored in distributed databases that can be hosted close to the source of the data. Such a database should be able to scale up or down depending on the load and changing characteristics of the application. Our approach enables this for the dominant class of databases called NoSQL databases. We have two patent-pending technologies in this space, ready to be licensed from Purdue’s OTC.
Genome engineering: Genome reads are still rather error prone, especially as we look toward newer technologies, like nanopore sequencing. ICAN considers these reads as words of a language and just like Natural Language Processing (NLP) can correct errors in language, we are optimizing technologies from there to correct genomic reads. Our approach enables the huge flourishing of the work in deep neural networks (DNNs) to be applied to understanding the “language” of the genome, so that we can derive actionable insights from them, such as, developing precise genome editing technologies.