Almost five years after Infiniswap, memory disaggregation is now a mainstream research topic. It goes by many names, but the idea of accessing (unused/stranded) memory over high-speed networks is now close to reality. Despite many works in this line of research, a key remaining problem is ensuring resilience: how can applications recover quickly if remote memory fails, become corrupted, or is inaccessible? Keeping a copy in the disk like Infiniswap causes performance bottleneck under failure, but keeping in-memory replicas doubles the memory overhead. We started Hydra to hit a sweet spot between these two extremes by applying lessons learned from our EC-Cache project to extremely small objects. While EC-Cache explicitly focused on very large objects in the MB range, Hydra aims to perform erasure coding at the 4KB page granularity in microsecond timescale common for RDMA. Additionally, we extended Asaf’s CopySets idea to erasure coding to tolerate concurrent failures with low overhead.
We present Hydra, a low-latency, low-overhead, and highly available resilience mechanism for remote memory. Hydra can access erasure-coded remote memory within a single-digit μs read/write latency, significantly improving the performance-efficiency tradeoff over the state-of-the-art – it performs similar to in-memory replication with 1.6× lower memory overhead. We also propose CodingSets, a novel coding group placement algorithm for erasure-coded data, that provides load balancing while reducing the probability of data loss under correlated failures by an order of magnitude. With Hydra, even when only 50% memory is local, unmodified memory-intensive applications achieve performance close to that of the fully in-memory case in the presence of remote failures and outperforms the state-of-the-art remote-memory solutions by up to 4.35×.
Youngmoon started working on Hydra right after we presented Infiniswap in early 2017! As Youngmoon graduated, Hasan started leading the project from early 2019, and Asaf joined us later that year. Together they significantly improved the paper over early drafts. Even then, Hydra faced immense challenges. In the process, Hydra has now taken over Justitia for the notorious distinction of my current record for accepted-after-N-submissions.
This was my first time submitting to FAST.