IoT and Scalable Data Lakes
Our lab is interested in developing scalable infrastructure for IoT datasets that are acquired from heterogeneous sensors on farms, whether in the soil or aerial sensors in the form of drones. These datasets may need to be more permanently stored in data lakes or efficiently stored in databases for fast retrieval for low-latency analytics.
Relevant projects:
- Machine learning for scalable databases for dynamic IoT workloads: this involves optimizing both on-premise databases and cloud-hosted databases to be able to scale with the dynamic time-varying workloads that are typical of IoT systems. For this, we have optimized both Cassandra and Redis, the latter being an example of an in-memory database.
- Our latest efforts are directed at optimizations for serverless analytics: this is a form of utility computing that allows the cloud provider to dynamically manage the allocation of machine resources based on the actual amount of resources that are used at a fine granularity resulting in pay-as-you-go computing. One of the caveats of serverless computing is the danger of vendor lock-in. We are developing on top of an open framework called OpenLambda.