A system and method for fault containment and error handling within a
domain in a partitioned computer system includes a system manager having
read and write access to a resource definition table. The system manager
is adapted to quiesce the system when failure occurs within a domain,
identify an allocated resource associated with the failed domain,
identify a non-failed domain, and exit the quiesce mode for the
non-failed domain, thereby containing a failure within the failed domain.
The system manager further handles an error within the failed domain by
deallocating a resource allocated to the failed domain so that the
resource becomes available to non-failed domains.