An apparatus and method are disclosed for easily generating document data (tag
file) in a form that makes it possible to perform various processes upon the document
data. An original document (plain text) is divided into morphological elements,
and morphological information is added thereto. Information representing the hierarchical
document structures is also added. Furthermore information indicating referential
relations between portions in the original document is also added.