We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that allows programmers to perform in-memory computations on large clusters while retaining the fault tolerance of data flow models like MapReduce. RDDs are motivated by two types of applications … Continue Reading ››
Technical report on Spark is available Online
A technical report describing the key concepts behind Spark is available online. The abstract goes below: