This document surveys various fault tolerance checkpointing algorithms in distributed systems, emphasizing the crucial role of checkpointing in allowing systems to recover from failures. It distinguishes between traditional and mobile distributed systems, highlighting unique challenges such as limited storage and power in mobile environments. The paper also analyzes different types of checkpointing techniques, including user-triggered, coordinated, and message logging, detailing their advantages and disadvantages.
Related topics: