H. Balakrishnan, F. Kaashoek, D. Karger, R. Morris, I. Stoica, “Looking Up Data in P2P Systems,” Communications of the ACM, V. 46, N. 2, (February 2003). [ACM]
Summary
P2P systems have attracted large following due to their manageability, flexibility, resource-aggregation capabilities, and fault-tolerance properties. One of the biggest challenges in this area of research is finding a given data item in a large P2P system in a scalable manner. This paper provides a brief overview of the lookup problem, mostly from a DHT-based structured P2P systems’ points of view.
Centralized and hierarchical lookup of data in conventional systems suffer from scalability and resiliency issues. To overcome that, P2P systems developed a notion of symmetric lookup algorithms, which can be primarily categorized into unstructured, semi-structured, and structured mechanisms. DHT-based solutions to the lookup problem are both structured and symmetric and work well in dynamic environments where nodes join and leave the network as well as can randomly fail. In its core, a DHT implements just one function, lookup(key), that finds a data item given its uniquely identifiable key: nodes forward requests they cannot retrieve locally to nodes whose ID is closer to the destination based on some distance function. It also stores keys in a load-balanced way among the participating nodes, before it can actually look them up.
The authors divide DHT-based lookup systems into single- and multi-dimensional implementations and briefly explain three examples. Chord and Pastry use single-dimensional circle and tree structures, respectively; whereas, CAN uses a d-dimensional Cartesian coordinate space. All of them try to route a query to the data node based on a distance metric, as mentioned earlier, and provide different complexities based on space-time trade-offs. The characteristics of the distance function highly affects the workflow of a DHT implementation. For example, being not symmetric, Chord’s distance metric requires it to use a stabilization protocol in the background unlike Pastry, which uses a symmetric distance metric. Several open challenges, circa 2002-03, are listed at the end of the paper. Most, if not all, of them are either completely solved by now or reached their saturation points.
So the real question is whether p2p systems is old hat or still worthy of study? File sharing systems are pretty well developed at this stage, with Bittorrent as one of the most successful, and as I far as I know, it does not really use anything like Chord for its operation. Have these ideas really impacted systems, in the final assessment?
There are literally thousands of papers on p2p without much mainstream success, except for BT. And BT itself does not use well known research works like Chord or CAN or Pastry or Tapestry (I think it uses or at least used to use some moderately large variation of Kademlia). So my view is either p2p research is all done or it arrived before its time and will flourish if some other factors (no idea about what they might be) come into play. Overall, its just going no where with hundreds of papers every year.