We are glad to announce the first open-source release of Varys, an application-aware network scheduler for data-parallel clusters using the coflow abstraction. It’s a stripped-down dev-alpha release for the experts, so please be patient with it!
A quick overview of the system can be found at varys.net. Here is a 30-second summary:
Varys is an open source network manager/scheduler that aims to improve communication performance of Big Data applications. Its target applications/jobs include those written in Spark, Hadoop, YARN, BSP, and similar data-parallel frameworks.
Varys provides a simple API that allows data-parallel frameworks to express their communication requirements as coflows with minimal changes to the framework. Using coflows as the basic abstraction of network scheduling, Varys implements novel schedulers either to make applications faster or to make time-restricted applications complete within deadlines.
Primary features included in this initial release are:
- Support for in-memory and on-disk coflows,
- Efficient scheduling to minimize the average coflow completion times, and
- In the deadline-sensitive mode, support for soft deadlines.
Here are some links, if you want to check it out, contribute to make it better, or just want to point someone else who can help us.
Project Website: http://varys.net
Git repository: https://github.com/coflow/varys
Relevant tools: https://github.com/coflow
Research papers
1. Efficient Coflow Scheduling with Varys
2. Coflow: A Networking Abstraction for Cluster Applications
The project is still in its early stage and can use all the help to become successful. We appreciate your feedback.