The present invention provides a system and method for the execution of
jobs in a distributed computing architecture that uses worker clients
which are characterized by a checkpointing mechanism component for
generating checkpointing information being assigned to at least one
worker client, at least one failover system being assigned to the worker
client, a component (failover system selection component) for
automatically assigning at least one existing or newly created failover
system to the failure system being assigned to a worker client in the
case said worker clients fails, wherein the assigned failover system
provides all function components in order to take over the execution of
the job when said assigned worker client fails, wherein the assigned
failover system further includes at least a failover monitor component
for detecting failover situations of said assigned worker client.