ABE 591: Big Data Analytics and HPC
Machine Learning and High-Performance Computing for Digital Agriculture and Biological
Part 1: Algorithms, resilient data lakes, and analytics at the edge
Course number: ABE 591
Instructor: Somali Chaterji, PhD
Assistant Professor, Ag and Biological Engineering
1x weekly, 2 hours, online offerings due to Covid-19
(classroom offerings were scheduled to be in WALC 3121, Wednesdays 6:00 PM - 7:50 PM)
For course-related queries, please feel free to email me with the subject line:
"Re: ABE 591: Big Data Analytics and HPC, Part 1"
Course info PDF
- “Genomical” and IoT data: concepts and storage
- ML application stories [Part 1]
- How to classify?
- Linear, non-linear, and logistic regression
- Interpretable ML
- Data immersion (AI/ML stories) [Part 2]
- A practitioner's guide to ML (includes the use of real-world datasets)
- Shallow versus deep networks
- Activation functions
- Special topics (pick 2 out of the list below, will keep updating)
- Data lakes and scalable databases
- Computing in the Cloud
- Living on the edge (edge computing)
- GPU acceleration and ASICs
- Supercomputers and Distributed Computation
- Regulatory Genomics and Epigenomics
Want to know more about the following topics?
- Deep learning
- Reward learning
- GPU accelerators
- Learning at the edge
- Resilient data lakes
- In-sensor analytics
- Living cells as sensors
This course will provide the conceptual and algorithmic foundations for machine learning (ML) applied to
digital agriculture, and IoT, to name a few domains. These are some of the domains that are generating
of data, both diverse and available in large volumes. These data sets can be used for “deciphering the rules
life” or to extract actionable information from the ubiquitous IoT sensors. As an example, these sensors may
inexpensive soil sensors that are measuring the moisture level, pH, or microbial diversity of soil samples.
Alternately, you can think of these sensors as living cells that sense and actuate molecular pathways in
to changes in the cell’s microenvironment.
Overall, this course is meant for both senior undergraduate or graduate students and does not assume any
knowledge of ML algorithms. It is expected that the student will have a willingness to delve in data science
engineering principles with the need to go beyond the superficial tools available through various software
We will discuss domain concepts to lay the groundwork for the kinds of data sets that domains related to
agricultural and biological engineering students see, which could range from cropping systems to geospatial
genomics and machine systems, and environment and forestry. We will sprinkle in concepts, algorithms, and
frameworks for ML and high-performance computing (HPC) tools, such as supervised and unsupervised learning,
learning, and feature reduction, and data storage.
This course is part of a Purdue initiative that aims to deliver stackable one credit courses to create a
data-science curriculum. These courses will be offered online too through lecture capture during live
future digitized offerings.