Word boundary identification operations such as morpheme analysis is performed
on documents to be registered, and the top positions and the end positions of words
are identified. Word boundary information is obtained based on these identification
results. Search indexes are created for sub-strings of a predetermined length (n-grams)
extracted from the document being registered. The search index includes document
identification information as well as occurrence position information which indicates
that the string is located at the n-th position from the beginning of the text
data, and word boundary information for an n-gram in a document.