A method and system for machine memory power and availability management
in a processing system supporting multiple virtual machines provides a
mechanism for supporting memory power management and memory failure
management in large scale systems. Upon a decision to take a memory
module off-line or place the module in an increased-latency state for
power management, or upon a notification that a memory module has failed
or been taken off-line or has had latency increased by another power
management control mechanism, a hypervisor that supports multiple virtual
machines checks the use of pages by each virtual machine and its guest
operating system by using a reverse mapping. The hypervisor determines
which virtual machines are using a particular machine memory page and may
re-map the machine memory page to another available machine page, or may
notify the virtual machines that the memory page has become or is
becoming unavailable via a fault or other notification mechanism.
Alternatively, or in the absence of a response from a virtual machine,
the hypervisor can shut down the affected partition(s).