Lee, J., Wang, P., Xu, R., Dasari, V., Weston, N., Li, Y., Bagchi, S., and Chaterji, S. (2021). Benchmarking Video Object Detection Systems on Embedded Devices under Resource Contention. Embedded and Mobile Deep Learning (EMDL), co-located with MobiSys 2021.

Paper BibTeX

Jayoung Lee, Pengcheng Wang, Ran Xu

Adaptive and efficient computer vision systems have been proposed to make computer vision tasks, e.g., object classification and object detection, optimized for embedded boards or mobile devices. These studies focus on optimizing the model (deep network) or system itself, by designing an efficient network architecture or adapting the network architecture at runtime using approximation knobs, such as image size, type of object tracker, head of the object detector (e.g., lighter-weight heads such as one-shot object detectors like YOLO over two-shot object detectors like FRCNN). In this work, we benchmark different video object detection protocols, including FastAdapt, with respect to accuracy, latency, and energy consumption on three different embedded boards that represent the leading edge mobile GPUs. Our set of protocols consists of Faster R-CNN, YOLOv3, SELSA, MEGA, and REPP. Further, we characterize their performance under different levels of resource contention, specifically GPU contention, as would arise due to co-located applications on these boards, contending with the video object detection task. Our key insights are that object detectors have to be coupled with trackers to keep up with the latency requirements (e.g., 30 fps). With this, FastAdapt achieves up to 76 fps on the most well-resourced NVIDIA Jetson-class board—the NVIDIA AGX Xavier. Second, adaptive protocols like FastAdapt, FRCNN, and YOLO (specifically our adaptive variants, FRCNN+ and YOLO+) work well under resource constraints. Among the latest video object detection heads, SELSA achieves the highest accuracy but at a latency of over 2 sec per frame. Our energy consumption experiments bring out that FastAdapt, adaptive FRCNN, and adaptive YOLO are best-in-class, relative to the non-adaptive protocols SELSA, MEGA, and REPP.

About the Presenter
Jayoung Lee is a first year graduate student in Purdue's Ag and Biological Engineering (ABE) Department and a PhD student of ICAN, Innovatory for Cells and Neural Machines (https://schaterji.io), led by Prof. Somali Chaterji at Purdue University. ABE is housed in both Colleges of Engineering and Agriculture and this gives Jay the ability to apply his machine learning algorithms to IoT and digital agriculture, leveraging real-world applications and living testbeds (e.g., https://schaterji.io/testbed/) for his ML solutions. Jay is supported by both Army Research Laboratory and by the National Science Foundation (CPS), so his work is looking at applications of computer vision algorithms both in IoBT and in digital agriculture. When not working, Jay enjoys playing tennis and jogging. For more information, please follow us on Twitter: @somalichaterji, @jayounglee30.