Tag Archives: MapReduce

Dremel: Interactive Analysis of Web-Scale Datasets

Google, "Dremel: Interactive Analysis of Web-Scale Datasets," VLDB, 2010. [PDF]

Summary

Dremel is Google's interactive  ad hoc query system for analysis of read-only nested data. Unlike MapReduce, Dremel is aimed toward data exploration, monitoring, and debugging, where near real-time performance is of utmost importance. To achieve scalability and performance, Dremel builds upon three key ideas:
  1. It … Continue Reading ››

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," EuroSys, 2007. [PDF]

Summary

Dryad is Microsoft's answer to the MapReduce paradigm, albeit at a (slightly) lower level with greater flexibility. Like MapReduce, Dryad allows developers to think about what to do with the data, and Dryad … Continue Reading ››

MapReduce: Simplified Data Processing on Large Clusters

Jeffrey Dean, Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," OSDI, 2004. [PDF]

Summary

MapReduce is a programming model and associated implementation for processing and generating large data sets in a parallel, fault-tolerant, distributed, and load-balanced manner. There are two main functions (both user provided) in this programming model. The map function takes an input … Continue Reading ››