ABE 591: Big Data Analytics and HPC

Machine Learning and High-Performance Computing for Digital Agriculture and Biological Engineering
Part 1: Algorithms, resilient data lakes, and analytics at the edge

Course Materials
Data Science

Course info

Course number: ABE 591

Instructor: Somali Chaterji, PhD
Assistant Professor, Ag and Biological Engineering

1x weekly, 2 hours, online offerings due to Covid-19
(classroom offerings were scheduled to be in WALC 3121, Wednesdays 6:00 PM - 7:50 PM)

For course-related queries, please feel free to email me with the subject line: "Re: ABE 591: Big Data Analytics and HPC, Part 1"

Video description Course flyer Course info PDF

Course highlights

This interactive course will feature active learning. The instructor will customize the difficulty level to the cohort of students to make data science and engineering accessible to all students through varying difficulty of quizzes and course content. There is no pre-requisite for the course, but attendance is highly encouraged.

Topics include:

  • “Genomical” and IoT data: concepts and storage
  • ML application stories [Part 1]
  • How to classify?
  • Linear, non-linear, and logistic regression
  • Interpretable ML
  • Data immersion (AI/ML stories) [Part 2]
  • A practitioner's guide to ML (includes the use of real-world datasets)
  • Shallow versus deep networks
  • Activation functions
  • Special topics (pick 2 out of the list below, will keep updating)
    • Data lakes and scalable databases
    • Computing in the Cloud
    • Living on the edge (edge computing)
    • GPU acceleration and ASICs
    • Supercomputers and Distributed Computation
    • Regulatory Genomics and Epigenomics

Want to know more about the following topics?

  • Deep learning
  • Reward learning
  • GPU accelerators
  • Supercomputers
  • Learning at the edge
  • Resilient data lakes
  • In-sensor analytics
  • Living cells as sensors

This course will provide the conceptual and algorithmic foundations for machine learning (ML) applied to genomics, digital agriculture, and IoT, to name a few domains. These are some of the domains that are generating terabytes of data, both diverse and available in large volumes. These data sets can be used for “deciphering the rules of life” or to extract actionable information from the ubiquitous IoT sensors. As an example, these sensors may be inexpensive soil sensors that are measuring the moisture level, pH, or microbial diversity of soil samples. Alternately, you can think of these sensors as living cells that sense and actuate molecular pathways in response to changes in the cell’s microenvironment.

Overall, this course is meant for both senior undergraduate or graduate students and does not assume any prior knowledge of ML algorithms. It is expected that the student will have a willingness to delve in data science and engineering principles with the need to go beyond the superficial tools available through various software bundles and packages.

We will discuss domain concepts to lay the groundwork for the kinds of data sets that domains related to agricultural and biological engineering students see, which could range from cropping systems to geospatial data, genomics and machine systems, and environment and forestry. We will sprinkle in concepts, algorithms, and frameworks for ML and high-performance computing (HPC) tools, such as supervised and unsupervised learning, deep learning, and feature reduction, and data storage.

This course is part of a Purdue initiative that aims to deliver stackable one credit courses to create a custom data-science curriculum. These courses will be offered online too through lecture capture during live lectures for future digitized offerings.