COAST | SHEPHERD

The rapid rise in social connectivity and e-commerce requires increasing computer memory to store and process larger amounts of data with low latency. But more computer memory means more memory errors, which can disrupt service availability and cause data loss. In this context, SHEPHERD investigates cross-layer memory resilience techniques where software and hardware work together to prevent service unavailability and data loss due to memory errors in disaggregated memory with low overhead.

Publications

2024

IISWC

Taming Performance Variability caused by Client-Side Hardware Configuration

Georgia Antoniou, Haris Volos, and Yiannakis Sazeides

In IISWC ’24: Proceedings of the 2024 IEEE International Symposium on Workload Characterization 2024

2021

CAL

The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory

Volos, Haris

IEEE Computer Architecture Letters 2021

Abs DOI arXiv HTML PDF Media

Disaggregated memory leverages recent technology advances in high-density, byte-addressable non-volatile memory and high-performance interconnects to provide a large memory pool shared across multiple compute nodes. Due to higher memory density, memory errors may become more frequent. Unfortunately, tolerating memory errors through existing memory-error protection techniques becomes impractical due to increasing storage cost. This letter proposes replication-aware memory-error protection to improve storage efficiency of protection in data-centric applications that already rely on memory replication for performance and availability. It lets such applications lower protection storage cost by weakening the protection of each individual replica, but still realize a strong protection target by relying on the collective protection conferred by multiple replicas.

Funding info

Grant agreement ID: 101029391
Start date: 1 September 2021
End date: 30 December 2024
Funded under: H2020-EU.1.3.2.
Coordinated by: University of Cyprus