Generating a transfer dictionary used in a transfer-based translation
machine system. A pair of source/target language sentences are received.
The source language sentence comprises at least one marked idiom, at
least one argument and at least one marked collocation. The target
language sentence comprises the target language translation for the idiom
and the source language word(s) for the argument. The source language
sentence is parsed to generate a source language syntactic tree. Nodes
are extracted from the source language syntactic tree. A least common
ancestor node of the extracted nodes is calculated and source language
structure information is generated based on the source language syntactic
tree data structure. Target language structure information is generated
by adding the part-of-speech information to each morpheme in the target
language sentence and by replacing each source language word in the
target language with the corresponding syntactic information within the
source language syntactic tree.