A processing system and computer program provides memory power management
and memory failure management in large scale systems. Upon a decision to
take a memory module off-line or place the module in an increased-latency
state for power management, or upon a notification that a memory module
has failed or been taken off-line or has had latency increased by another
power management control mechanism, a hypervisor that supports multiple
virtual machines checks the use of pages by each virtual machine and its
guest operating system by using a reverse mapping. The hypervisor
determines which virtual machines are using a particular machine memory
page and may re-map the machine memory page to another available machine
page, or may notify the virtual machines that the memory page has become
or is becoming unavailable via a fault or other notification mechanism.
Alternatively, or in the absence of a response from a virtual machine,
the hypervisor can shut down the affected partition(s).