A RAID system includes a non-volatile memory storing a first program and
first and second copies of a second program, and a processor executing
the first program. The first program detects the first copy of the second
program is failed and repairs the failed first copy in the non-volatile
memory using the second copy. The failures may be detected at boot time
or during normal operation of the controller. In one embodiment, the
failure is detected via a CRC check. In one embodiment, the controller
repairs the failed copy by copying the good copy to the location of the
failed copy. In one embodiment, the system includes multiple controllers,
each having its own processor and non-volatile memory and program that
detects and repairs failed program copies. The programs include a loader,
an application, FPGA code, CPLD code, and a program for execution by a
power supply microcontroller.