The text format of input data is checked, and is converted into a
system-manipulated format. It is further determined if the input data is
in an HTML or e-mail format using tags, heading information, and the
like. The converted data is divided into blocks in a simple manner such
that elements in the blocks can be checked based on repetition of
predetermined character patterns. Each block section is tagged with a tag
indicating a block. The data divided into blocks is parsed based on tags,
character patterns, etc., and is structured. A table in text is also
parsed, and is segmented into cells. Finally, tree-structured data having
a hierarchical structure is generated based on the sentence-structured
data. A sentence-extraction template paired with the tree-structured data
is used to extract sentences.