Here is the vision of my lab, which straddles digital agriculture and computational biology using applied data analytics:
My areas of interest and expertise
The primary focus of my research is applied data analytics and data engineering directed at the domains of digital agriculture and precision health.
On the digital agriculture front, our focus is on the data engineering side especially designing algorithms for lightweight in-sensor analytics, partitioning algorithms across different platforms of computation (sensor → edge → cloud), and designing robust backend databases for high-performance data lakes for IoT in digital agriculture.
On the precision health front, our focus is on the computational genomics and synthetic biology sub-domains.
Here I will provide new dimensions and storylines from the work that we do at ICAN. In general, I will write this so that it is accessible to a general technical audience.
Summer 2020: Work inspired by funding from the Lilly Endowment and Army Research Laboratories [ARL].
My lab [https://schaterji.io] called Innovatory for Cells and Machines, abbreviated ICAN, got started in 2018 and since then we have been branching out in two distinct, albeit inter-related technical areas, first: Internet-of-Things (IoT) and edge computing for digital agriculture and other related IoT application domains, and second: genome engineering. They are related in the sense that my lab develops machine learning-based algorithms for efficient data analytics on the IoT side and for decoding the genome on the genome engineering side. Let me describe our novel approach in these two areas.
IoT and Edge Computing for Digital Agriculture: We are developing algorithms for handling large volumes of IoT data and to derive actionable insights from them. This is relevant in digital agriculture with data being generated by ground sensors and aerial drones. Our novel approach allows the data to be stored and processed close to the source of the data rather than being ferried all the way to a data center. This is important to ensure the privacy of the IoT data (e.g., it can be made to stay completely within the premises) and to ensure that the scarce wireless bandwidth and constrained energy resources are not spent carrying raw and irrelevant data to the backend. As part of our solution approach, we have developed:
- On-device computation: This is the approach where machine learning algorithms, such as, object recognition or activity recognition based on video data, can be approximated to run on small form factor devices.
- Federated machine learning: This is the approach where all data does not need to be centralized for building machine learning models — different data owners may have privacy misgivings about doing that. Rather, the model can be built through coordination among many local models with data staying private to each data owner. Our approach allows this to be done in a reliable and secure manner, tolerating some of the local clients failing or acting maliciously.
- Distributed databases: The data needs to be stored in distributed databases that can be hosted close to the source of the data. Such a database should be able to scale up or down depending on the load and changing characteristics of the application. Our approach enables this for the dominant class of databases called NoSQL databases. We have two patent-pending technologies in this space, ready to be licensed from Purdue’s OTC.
Genome engineering: Genome reads are still rather error prone, especially as we look toward newer technologies, like nanopore sequencing. ICAN considers these reads as words of a language and just like Natural Language Processing (NLP) can correct errors in language, we are optimizing technologies from there to correct genomic reads. Our approach enables the huge flourishing of the work in deep neural networks (DNNs) to be applied to understanding the “language” of the genome, so that we can derive actionable insights from them, such as, developing precise genome editing technologies.