Content Tags

There are no tags.

Asymptotic efficiency of restart and checkpointing.

RSS Source
Authors
Antonio Sodre

Many tasks are subject to failure before completion. Two of the most commonfailure recovery strategies are restart and checkpointing. Under restart, oncea failure occurs, it is restarted from the beginning. Under checkpointing, thetask is resumed from the preceding checkpoint after the failure. We studyasymptotic efficiency of restart for an infinite sequence of tasks, whose sizesform a stationary sequence. We define asymptotic efficiency as the limit of theratio of the total time to completion in the absence of failures over the totaltime to completion when failures take place. Whether the asymptotic efficiencyis positive or not depends on the comparison of the tail of the distributionsof the task size and the random variables governing failures. Our frameworkallows for variations in the failure rates and dependencies between task sizes.We also study a similar notion of asymptotic efficiency for checkpointing whenthe task is infinite a.s. and the inter-checkpoint times are i.i.d.. Moreover,in checkpointing, when the failures are exponentially distributed, we prove theexistence of an infinite sequence of universal checkpoints, which are alwaysused whenever the system starts from any checkpoint that precedes them.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.