Since our pioneering work on Infiniswap that attempted to make memory disaggregation practical, there has been quite a few proposals to use different application-level interfaces to remote memory over RDMA. A common issue faced by all these approaches is the high overhead of existing kernel data paths whether they use the swapping … Continue Reading ››
Category Archives: Recent News
AlloX Accepted to Appear at EuroSys’2020
While GPUs are always in the news when it comes to deep learning clusters (e.g., Salus or Tiresias), we are in the midst of an emergence of many more computing devices (e.g., FPGAs and problem-specific accelerators), including the traditional CPUs. All of them are compute devices, but one cannot expect the … Continue Reading ››
Thanks KLA for Supporting Our Research
KLA became our neighbor last year. It turns out building tool-chains for the semiconductor industry requires massive amount of data and image processing in near-realtime as well as addressing corresponding resource and cluster management challenges.
We look forward to working together on related, interesting problems down the line. … Continue Reading ››
Salus Accepted to Appear at MLSys'2020
With the rising popularity of deep learning, the popularity of GPUs has increased in recent years. Modern GPUs are extremely powerful with a lot of resources. A key challenge in this context is making sure that these devices are highly utilized. Although there has been a lot of research on improving GPU efficiency … Continue Reading ››
Sol and Pando Accepted to Appear at NSDI'2020
With the advent of edge analytics and federated learning, the need for distributed computation and storage is only going to increase in coming years. Unfortunately, existing solutions for analytics and machine learning have focused primarily on datacenter environments. When these solutions are applied to wide-area scenarios, their compute efficiency decreases and storage overhead … Continue Reading ››
Co-Organized NSF Workshop on Next-Gen Cloud Research Infrastructure
Earlier this week Jack Brassil and I co-organized an NSF-supported workshop on next-generation cloud research infrastructure (RI) in Princeton, NJ. The focus of the workshop was on the role of cloud on research and education, how needs are changing, and how cloud infrastructure should evolve to keep up with the changing needs. … Continue Reading ››
Joint Award With CMU on Distributed Storage. Thanks NSF!
This project aims to build on top of our past and ongoing works with Rashmi Vinayak (CMU) and Harsha Madhyastha (Michigan) to address the optimal performance-cost tradeoffs in distributed storage. It's always fun to have the opportunity to be able to work with great friends and colleagues.
NSF Award to Expand Our Systems+AI Research!
This project aims to extend and expand our forays into micro- and macro-level GPU resource management for distributed deep learning applications.
Thanks NSF!
Received VMware Early Career Faculty Award!
A few weeks ago, I received a cold email from VMware Research's Irina Calciu with this great news! The award is to support our ongoing research on memory disaggregation. VMware is doing some cool work in this space as well, and I look forward to collaborating with them in near future.
Near Optimal Coflow Scheduling Accepted at SPAA’2019
Since the inception of coflow in 2012, the abstraction and works surrounding it are growing at a fast pace. In addition to systems building, we have seen a rise of theoretical analyses of the coflow scheduling problem. One of the most recent ones to this end has even received a Best Student Paper … Continue Reading ››