Similar document retrieving method and system for retrieving similar
documents from a document database storing plural documents written in
different languages with high accuracy while suppressing retrieval noise
even when difference is found in the number of registered documents in
dependence on the species of description languages. Statistical
information concerning the registration-subjected documents is collected
on a language-by-language basis upon registration thereof. Upon retrieval
of documents similar to a query document, weights of words extracted from
the query document are taken into account and on a language-by-language
basis by referencing the statistical information.