An index generator and query expander for use in information retrieval in a
corpus. A corpus is provided as an input to an inflectional analyzer,
which produces a lemmatized corpus having base forms and associated
inflections for each word in the original corpus. The lemmatized corpus is
provided as an input to a disambiguator, which performs part of speech
tagging and morpho-syntactic disambiguation to produce a disambiguated
corpus. The disambiguated corpus is provided as an input to a derivational
generator, which produces an expanded corpus having all possible valid
derivatives of each word of the disambiguated corpus. The disambiguated
corpus is provided as an input to a transformational analyzer, using a
grammar and a metagrammar for analyzing syntactic and morphosyntactic
variations to conflate and generate variants, producing an index to the
corpus having a minimum of variants. Alternatively, a query expander is
provided utilizing similar techniques.