Deep learning, and machine learning in general, is taking over the world. It is, however, quite expensive to tune, train, and serve deep learning models. Naturally, improving the efficiency and performance of deep learning workflows has received significant attention (Salus, Tiresias, and Fluid to … Continue Reading ››
Tag Archives: NSDI
Justitia Accepted to Appear at NSDI’2022
The need for higher throughput and lower latency is driving kernel-bypass networking (KBN) in datacenters. Of the two related trends in KBN, hardware-based KBN is especially challenging because, unlike software KBN such as DPDK, it does not provide any control once a request is posted to the hardware. RDMA, which is the … Continue Reading ››
Kayak Accepted to Appear at NSDI’2021
As memory disaggregation and resource disaggregation, in general, become popular, one must make a call about whether to continue moving data from remote memory or to sometimes ship compute to remote data too. This is not a new problem in the context of disaggregated datacenters either. The notion of data locality and associated … Continue Reading ››
Sol and Pando Accepted to Appear at NSDI'2020
With the advent of edge analytics and federated learning, the need for distributed computation and storage is only going to increase in coming years. Unfortunately, existing solutions for analytics and machine learning have focused primarily on datacenter environments. When these solutions are applied to wide-area scenarios, their compute efficiency decreases and storage overhead … Continue Reading ››
Tiresias Accepted to Appear at NSDI’2019
With the advancement of AI in recent years, GPUs have emerged as a popular choice for training deep learning (DL) models on large datasets. To deal with ever-growing datasets, it is also common to run distributed deep learning over multiple GPUs in parallel. Achieving cost-effectiveness and high performance in these clusters relies on … Continue Reading ››
Infiniswap Accepted to Appear at NSDI’2017
Update: Camera-ready version is available here. Infiniswap code is now on GitHub!
As networks become faster, the difference between remote and local resources is blurring everyday. How can we take advantage of these blurred lines? This is the key observation behind resource disaggregation and, to some extent, rack-scale computing. In this paper, we take our … Continue Reading ››
HUG Accepted to Appear at NSDI’2016
Update: Camera-ready version is available here now!
With the advent of cloud computing and datacenter-scale applications, simultaneously dealing with multiple resources is the new norm. When multiple parties have multi-resource demands, fairly dividing these resources (for some notion of fairness) is a core challenge in the resource allocation literature. Dominant Resource Fairness (DRF) in NSDI'2011 was the first work to … Continue Reading ››
Spark wins the Best Paper Award at NSDI’2012
Spark (Resilient Distributed Datasets/RDDs) has won the Best Paper award at NSDI 2012. Woohoo! We were also nominated for the inaugural Community Award for open-sourcing the project.
Spark has been accepted at NSDI’2012
Our paper "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing" has been accepted at NSDI'2012. This is Matei's brainchild and a joint work of a lot of people including, but not limited to, TD, Ankur, Justin, Murphy, and professors Ion Stoica, Scott Shenker, and Michael Franklin. Unlike many other systems papers, Spark is … Continue Reading ››