A system and method that converts legacy and proprietary documents into
extended mark-up language format which treats the conversion as
transforming ordered trees of one schema and/or model into ordered trees
of another schema and/or model. In embodiments, the tree transformers are
coded using a learning method that decomposes the converting task into
three components which include path re-labeling, structural composition
and input tree traversal, each of which involves learning approaches. The
transformation of an input tree into an output tree may involve labeling
components in the input tree with valid labels or paths from a particular
output schema, composing the labeled elements into the output tree with a
valid structure, and finding such a traversal of the input tree that
achieves the correct composition of the output tree and applies
structural rules.