A method, system, computer system, and computer-readable medium to perform
root cause analysis of failure of an application program in a clustering
environment. Upon the occurrence of a problem or event of interest,
cluster configuration data can be obtained to provide a common context
for events occurring at different software layers supporting an
application. Diagnostic information produced by the different software
layers can be obtained from various log files, which are typically in
different formats and on different nodes in the cluster. The diagnostic
information can be viewed in the context of the cluster and filtered to
identify events related to the failure. The related events can be
presented in a time-ordered sequence to assist in analysis of the event
of interest. Patterns of events that led to the failure can be identified
and documented for use in further problem analysis and for taking
preventative and/or corrective measures.