An apparatus and method of repairing a processor array for a failure
detected at runtime in a system supporting persistent component
deallocation are provided. The apparatus and method of the present
invention allow redundant array bits to be used for recoverable faults
detected in arrays during run time, instead of only at system boot, while
still maintaining the dynamic and persistent processor deallocation
features of the computing system. With the apparatus and method of the
present invention, a failure of a cache array is detected and a
determination is made as to whether a repairable failure threshold is
exceeded during runtime. If this threshold is exceeded, a determination is
made as to whether cache array redundancy may be applied to correct the
failure, i.e. a bit error. If so, the cache array redundancy is applied
without marking the processor as unavailable. At some time later, the
system undergoes a re-initial program load (re-IPL) at which time it is
determined whether a second failure of the processor occurs. If a second
failure occurs, a determination is made as to whether any status bits are
set for arrays other than the cache array that experienced the present
failure, if so, the processor is marked unavailable. If not, a
determination is made as to whether cache redundancy can be applied to
correct the failure. If so, the failure is corrected using the cache
redundancy. If not, the processor is marked unavailable.