A system for generating language modeling data for a speech recognition
system includes an expression extractor to extract expression from
domain-specific data of an existing domain using a base of linguistic
knowledge, a concept structure mapper to map extracted expression to
expression in a new domain using vocabulary for the new domain, a
concatenation module to concatenate extracted expression with
domain-general data, and a filter arrangement to identify and filter out
unrealistic expression in the mapped or concatenated expression.