Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System, ” SOSP, (October, 2003). [PDF]
Summary
The Google File System or GFS is a scalable, fault-tolerance distributed file system custom-designed to handle Google’s data-intensive workloads. GFS provides high aggregate throughput for a large number of readers and append-only writers (i.e., no overwrites) in a fault-tolerant manner, while running on inexpensive commodity hardware. It is specially optimized for large files, sequential reads/writes, and high sustained throughput instead of low latency. GFS uses a single master with minimal involvement in regular operations and a relaxed consistency model to simplify concurrent operations by many clients on same files.
GFS sits on top of local file systems in individual machines to create a meta-file system, where each of the participating machine is called a chunkserver. The master keeps track of meta-data using soft states that are periodically refreshed by talking to chunkservers. GFS divides files into large 64MB chunks and replicates these chunks across machines and racks for fault-tolerance and availability. It uses daisy-chained writing for better utilizing the network bandwidth. It is not POSIX compliant but supports a reasonable set of file system operations (e.g., create, delete, open, close, read, and write) useful for Google’s operations. Unlike any other file system, GFS supports atomic record appends to allow multiple clients to append records to the same file.
Key tradeoffs in GFS’ design include throughput vs latency, using an expensive storage system vs designing an distributed file system using commodity hardware, write traffic for replication vs savings in read traffic due to data locality, and simplicity of design using a single master and a relaxed consistency model vs highly consistent but more complex design. It is noticeable that the authors almost always go for simpler design at the expense of many highly regarded traditional file system properties.
Comments
GFS has successfully achieved its goal of performance, availability, reliability, and scale that the designers set out to achieve. In addition, it has directly motivated the design of the Hadoop Distributed File System (HDFS), which is its open source equivalent and widely used by the industry. Higher level storage systems have also been developed on top of HDFS, which is a testament to GFS’ success. As long as MapReduce like large read/append only workloads will be around, GFS and HDFS are expected to remain relevant as well.