The present invention provides a method and a system for identifying
relevant information in a data set. The method involves the
identification of nodes of interest in a tree structure. A node of
interest is a node that contains information, which is relevant to a
pre-defined context. The method further involves the step of iteratively
extracting sub-trees from the tree structure and identifying records in
the extracted sub-trees. The sub-tree is a hierarchical structure that
shows the relationship of each node of interest with its ancestor nodes
in the tree structure. Each record is a group of sub-tree nodes and
contains at least one node of interest.