Michael Armbrust, Armando Fox, David A. Patterson, Nick Lanham, Beth Trushkowsky, Jesse Trutna, Haruki Oh, “SCADS: Scale-Independent Storage for Social Computing Applications,” CIDR, 2009. [PDF]
Summary
SCADS (Scalable Consistency Adjustable Data Storage) is a proposal for a collection of components leveraging database, control theory, and machine learning techniques to achieve data scale independence for rapidly growing (or shrinking) Web 2.0 services. It has three key components:
- A performance-insightful query language (PIQL) that provides strict scalability guarantees and predictable performance;
- A declarative way for developers to explicitly define there performance-consistency tradeoff requirements; and
- Machine learning models to add and remove capacity to meet SLA requirements.
Critique
I believe that restricting queries to have bounded performance and allowing developers to explicitly dictate/specify their requirements/deadlines are the key contributions of this proposal. There are several other interesting concepts embedded in different parts of the proposal; it is not clear, however, how influential the whole architecture will be.
The key tradeoff in SCADS is whatever-is-required for predictable performance. The authors are willing to restrict queries, use CPU/disks to build additional indices, consider developer inputs and give feedback to them, and do several other things to ensure predictability.
Since this is a position paper, the authors only provide high-level ideas without any concrete solution. Many of the components proposed in this paper have so far been developed, and there is a good chance that the overall architecture will see the light of day at some point, in some form.