Distributed data-parallel queries run on large collections of resources — within a single datacenter in case of traditional big data queries and across multiple datacenters in case of geo-distributed analytics. The total amount of available resources to a running query changes over time for a variety of reasons, including changes in the fair share of assigned resources due to new job arrivals, machine failures, changes in spot instance pricing, fluctuations in available WAN bandwidth, etc. Traditional big data systems rely on the cluster scheduler to deal with these fluctuations. Each query is assigned a fixed query plan by a query planner to convert it to a distributed data-parallel job, which is then executed by an execution engine. Over the years, this has led to pushing more complexity into the cluster scheduler for better performance. But what if we could change the query plan?
In QOOP, we do exactly that and show that this can lead to simpler execution engines and cluster schedulers. Instead of pushing complexity down the stack, QOOP allows the query planner to re-plan queries on resource fluctuations using a simple greedy algorithm. We prove that this is expected to perform well and empirically validate the theoretical result.
Modern data processing clusters are highly dynamic — both in terms of the number of concurrently running jobs and their resource usage. To improve job performance, recent works have focused on optimizing the cluster scheduler and the jobs’ query planner with a focus on picking the right execution plan — represented as a directed acyclic graph (DAG) — for a job in a resource-aware manner, and scheduling jobs in a DAG-aware manner. However, because existing solutions use a fixed DAG throughout the entire execution, the inability to adapt a DAG in reaction to resource changes often leads to large performance inefficiencies.
This paper argues for dynamic query re-planning, wherein we re-evaluate and re-plan a job’s DAG during its execution. We show that designing for re-planning requires fundamental changes to the interfaces between key layers of data analytics stacks today, i.e., the query planner, the execution engine, and the cluster scheduler. Instead of pushing more complexity into the scheduler or the query planner, we argue for a redistribution of responsibilities between the three components to simplify their designs. Under this redesign, we analytically show that a greedy algorithm for re-planning and execution alongside a simple max-min fair scheduler can offer provably competitive behavior even under adversarial resource changes. We prototype our algorithms atop Apache Hive and Tez. Using extensive experiments on a 20-node cluster, we show that our design can offer a median performance improvement of 1.47x compared to state-of-the-art alternatives.
While QOOP focused on small-scale clusters, its core ideas can be applied to other dynamic query re-planning scenarios — for example, in response to WAN bandwidth fluctuations in the context of geo-distributed analytics.
This work is my second OSDI collaboration with Aditya after Carbyne in OSDI 2016 and my third OSDI paper overall. This time I got to work with another incredible student, Kshiteej Mahajan, as well as a terrific theoretician, Shuchi Chawla. I look forward to many more fruitful collaborations between our groups in the future.