John L. Hennessy, David A. Patterson, “Graphic Processing Units (GPUs),” Chapter 4, Computer Architecture, Fifth Edition: A Quantitative Approach, 2011. [LINK]
Mark D. Hill, Michael R. Marty, “Amdahl’s Law in the Multicore Era,” IEEE Computer, (July, 2008). [PDF]
H. Howie Huang, Shan Li, Alex Szalay, Andreas Terzis, “Performance Modeling and Analysis of Flash-based Storage Devices,” 27th IEEE Symposium on Massive Storage Systems and Technologies (MSST), 2011. [PDF]
Summary
We are in the middle of a massive shift in hardware landscape. All aspects of computing, be it processing, memory, storage, or network, are changing really fast. These set of articles touch upon some of them. Hennessy and Patterson discuss the utility and applicability of SIMD (Single Instruction Multiple Data) architectures that come with modern Graphic Processing Units or GPUs. Hill and Marty extend the Amdahl’s to the era of many-core processors and essentially promote asymmetric multicore chips over symmetric ones with a preference for a more dynamic design. Lastly, Huang et al. make a case for Flash-based storage devices (read SSDs) by comparing their performance against rotational HDDs using a black-box model.
Before spitting out my 2 cents, I have to say that none of the articles cover the networking trends. Things are getting faster on the wire too. 10Gbps and 40Gbps links are becoming increasingly common on the higher layers of the datacenter hierarchy, and some facilities are even trying out 100Gbps interconnects. Also, while discussing the many-core/multi-core chips, the question of memory bandwidth to and from those chips has also been ignored.
Possible Impacts on Future Datacenter Designs
CPU
The advance of many-core chips is unavoidable. We can expect to see asymmetric designs in the future that will combine 1 (or more) strong cores with many weak cores. In fact, there is a similarity between asymmetric design mentioned by Hill and Marty and the current setup of CPU and GPU in our desktop machines. Recent calls for putting both of them on the same chip by many vendors point toward a move to asymmetric designs. Cluster schedulers will have to be made aware of the presence of such strong and weak cores as well.
However, GPU-based parallelism requires too much attention from software designers/developers. Unless something fundamentally different happens (in terms of may be programming abstraction), it is likely to stay that way and will be avoided for general cluster software systems as much as possible.
Storage/Memory
Given the weak write/update/erase performance of Flash-based devices along with their cost overhead, it is likely to see one more level in the storage/memory hierarchy. In between in-memory cache and underlying HDDs, a new layer of SSDs will be introduced for read-heavy workloads. Writes should directly go down to HDDs from memory and remain there until the corresponding data are identified as read-heavy and brought up to the SSD layer.
We are already seeing calls for putting everything (or a lot) in memory (e.g., in RAMCloud). This trend will continue to grow to the point it becomes prohibitively expensive. Cluster memory/cache management systems like Memento will become more and more common.
Network
Network designs will change toward a minimum of 10Gbps links. Full bisection bandwidth will become a fact of life. The problems however will not completely go away. Full bisection bandwidth does not necessarily mean infinite bandwidth and network management systems (e.g., Orchestra) within and across datacenters will become increasingly common.