While training and inference of deep learning models have received significant attention in recent years (e.g., Tiresias, AlloX, and Salus from our group), hyperparamter tuning is often overlooked or put together in the same bucket of optimizations as training. Existing hyperparameter tuning solutions, primarily … Continue Reading ››
Tag Archives: GPU
Presented Keynote Speech at HotEdgeVideo’2020
Earlier this week, I presented a keynote speech on the state of resource management for deep learning at the HotEdgeVideo'2020 workshop, covering our recent works on systems support for AI (Tiresias, AlloX, and Salus) and discussing open challenges in this space.
AlloX Accepted to Appear at EuroSys’2020
While GPUs are always in the news when it comes to deep learning clusters (e.g., Salus or Tiresias), we are in the midst of an emergence of many more computing devices (e.g., FPGAs and problem-specific accelerators), including the traditional CPUs. All of them are compute devices, but one cannot expect the … Continue Reading ››
Salus Accepted to Appear at MLSys'2020
With the rising popularity of deep learning, the popularity of GPUs has increased in recent years. Modern GPUs are extremely powerful with a lot of resources. A key challenge in this context is making sure that these devices are highly utilized. Although there has been a lot of research on improving GPU efficiency … Continue Reading ››
NSF Award to Expand Our Systems+AI Research!
This project aims to extend and expand our forays into micro- and macro-level GPU resource management for distributed deep learning applications.
Thanks NSF!
Tiresias Accepted to Appear at NSDI’2019
With the advancement of AI in recent years, GPUs have emerged as a popular choice for training deep learning (DL) models on large datasets. To deal with ever-growing datasets, it is also common to run distributed deep learning over multiple GPUs in parallel. Achieving cost-effectiveness and high performance in these clusters relies on … Continue Reading ››