Distributed in-memory datasets

October 30, 2011 Mosharaf Leave a comment

AMPLab, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," UCB/EECS-2011-82, 2011. [PDF]

Russell Power, Jinyang Li, "Piccolo: Building Fast, Distributed Programs with Partitioned Tables," OSDI, 2010. [PDF]

Summary

MapReduce and similar frameworks, while widely applicable, are limited to directed acyclic data flow models, do not expose global states, and generally slow due … Continue Reading ››

Reviews

Cloud databases

October 25, 2011 Mosharaf Leave a comment

MIT, "Relational Cloud: A Database-as-a-Service for the Cloud," CIDR, 2011. [PDF]

Divyakant Agrawal, Amr El Abbadi, Sudipto Das, Aaron J. Elmore, "Database Scalability, Elasticity, and Autonomy in the Cloud," DASFAA, 2011. [PDF]

Relational Cloud

The key idea of the Relational Cloud project is to define the concept of transactional Database-as-a-Service (DBaaS), identify the key challenges toward … Continue Reading ››

Reviews

Declarative and finite state machine approaches to Cloud programming

October 22, 2011 Mosharaf Leave a comment

Perter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, Russell Sears, "BOOM Analytics: Exploring Data-Centric, Declarative Programming for the Cloud," EuroSys, 2010. [PDF]

Joe Armstrong, "Erlang: A Survey of the Language and Its Industrial Applications," Ninth Exhibition and Symposium on Industrial Applications of Prolog, 1996. [PDF]

BOOM

BOOM or Berkeley Orders-Of-Magnitude adopts a … Continue Reading ››

Reviews

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

October 17, 2011 Mosharaf Leave a comment

Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen, "Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS," SOSP, 2011. [PDF]

Summary

This paper introduces a new consistency model, causal+, that extends the causal consistency model and lies between sequential and causal consistency models. The authors claim that causal+ is the … Continue Reading ››

Reviews

PNUTS: Yahoo!’s Hosted Data Serving Platform

October 17, 2011 Mosharaf Leave a comment

Yahoo! Research, "PNUTS: Yahoo!’s Hosted Data Serving Platform," PVLDB, 2008. [PDF]

Summary

PNUTS is a scalable, highly available, and geographically distributed (but low latency) data store used by most Yahoo! online properties. To achieve both availability and partition tolerance, it uses a novel notion of consistency called per-record timeline consistency; under this model, all replicas of … Continue Reading ››

Reviews

Data-parallel pipelines using high-level languages

October 13, 2011 Mosharaf Leave a comment

Microsoft, "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language," OSDI, 2008. [PDF]

Google, "FlumeJava: Easy, Efficient Data-Parallel Pipelines," PLDI, 2010. [LINK]

Background

Data-parallel computing systems expose high-level abstractions to the users to reason about distributed computations, while handling low-level tasks of scheduling and automated fault-tolerance without any user input. At … Continue Reading ››

Reviews

Dremel: Interactive Analysis of Web-Scale Datasets

October 10, 2011 Mosharaf Leave a comment

Google, "Dremel: Interactive Analysis of Web-Scale Datasets," VLDB, 2010. [PDF]

Summary

Dremel is Google's interactive ad hoc query system for analysis of read-only nested data. Unlike MapReduce, Dremel is aimed toward data exploration, monitoring, and debugging, where near real-time performance is of utmost importance. To achieve scalability and performance, Dremel builds upon three key ideas:

It … Continue Reading ››

Reviews

Dynamo: Amazon’s Highly Available Key-value Store

October 7, 2011 Mosharaf Leave a comment

Amazon, "Dynamo: Amazon's Highly Available Key-value Store," SOSP, 2007. [PDF]

Summary

Dynamo is a highly available (99.9th percentile) key-value storage mechanism that sacrifices traditional consistency models for eventual consistency to achieve availability. Dynamo works with a simple query model, where read/write (get() and put()) operations are performed on data items uniquely identified by their keys. … Continue Reading ››

Reviews

Bigtable: A Distributed Storage System for Structured Data

October 6, 2011 Mosharaf Leave a comment

Google, "Bigtable: A Distributed Storage System for Structured Data," OSDI, 2006. [PDF]

Summary

Bigtable is a large-scale (petabytes of data across thousands of machines) distributed storage system for managing structured data. It is built on top of several existing Google technology (e.g., GFS, Chubby, and Sawzal) and used by many of Google's online … Continue Reading ››

Reviews

SCADS: Scale-Independent Storage for Social Computing Applications

October 4, 2011 Mosharaf Leave a comment

Michael Armbrust, Armando Fox, David A. Patterson, Nick Lanham, Beth Trushkowsky, Jesse Trutna, Haruki Oh, "SCADS: Scale-Independent Storage for Social Computing Applications," CIDR, 2009. [PDF]

Summary

SCADS (Scalable Consistency Adjustable Data Storage) is a proposal for a collection of components leveraging database, control theory, and machine learning techniques to achieve data scale independence for rapidly … Continue Reading ››

Mosharaf Chowdhury

Monthly Archives: October 2011

Distributed in-memory datasets

Summary

Cloud databases

Relational Cloud

Declarative and finite state machine approaches to Cloud programming

BOOM

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Summary

PNUTS: Yahoo!’s Hosted Data Serving Platform

Summary

Data-parallel pipelines using high-level languages

Background

Dremel: Interactive Analysis of Web-Scale Datasets

Summary

Dynamo: Amazon’s Highly Available Key-value Store

Summary

Bigtable: A Distributed Storage System for Structured Data

Summary

SCADS: Scale-Independent Storage for Social Computing Applications

Summary