A system and method that converts legacy and proprietary documents into
extended mark-up language format which treats the conversion as
transforming ordered trees of one schema and/or model into ordered trees
of another schema and/or model. In embodiments, the tree transformers are
coded using a learning method that decomposes the converting task into
three components which include path re-labeling, structural composition
and input tree traversal, each of which involves learning approaches. The
transformation of an input tree into an output tree may involve
decomposing the input document, labeling components in the input tree
with valid labels or paths from a particular output schema, composing the
labeled elements into the output tree with a valid structure, and finding
such a traversal of the input tree that achieves the correct composition
of the output tree and applies structural rules.