MANA for MPI: MPI-Agnostic Network-Agnostic Transparent Checkpointing

Garg, R; Price, G; Cooperman, G

Garg, R (reprint author), Northeastern Univ, Boston, MA 02115 USA.

HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019; (): 49

Abstract

Transparently checkpointing MPI for fault tolerance and load balancing is a long-standing problem in HPC. The problem has been complicated by the need......

Full Text Link