IoT and Scalable Data Lakes
Our lab is interested in developing scalable infrastructure for IoT datasets that are acquired from
heterogeneous
sensors on farms, whether in the soil or aerial sensors in the form of drones. These datasets may need to be
more
permanently stored in data lakes or efficiently stored in databases for fast retrieval for low-latency
analytics.
Relevant projects:
-
Machine learning for scalable databases for dynamic IoT workloads: this involves optimizing both on-premise
databases
and cloud-hosted databases to be able to scale with the dynamic time-varying workloads that are typical of IoT
systems.
For this, we have optimized both Cassandra and Redis, the latter being an example of an in-memory database.
-
Our latest efforts are directed at optimizations for serverless analytics: this is a form of utility computing
that
allows the cloud provider to dynamically manage the allocation of machine resources based on the actual amount
of
resources that are used at a fine granularity resulting in pay-as-you-go computing. One of the caveats of
serverless
computing is the danger of vendor lock-in. We are developing on top of an open framework called OpenLambda.