A method and system are provided for inferring a schema from an electronic document
containing hierarchical data for use in a spreadsheet application program. The
electronic document containing the hierarchical data is received into an application
program. The application program may be a spreadsheet application program. The
format of the hierarchical data structure may be XML. The hierarchical data includes
a set of nodes making up the structure of the hierarchical data. The nodes may
be XML elements and attributes. The hierarchical data is then parsed to discover
one of the nodes in the hierarchical data. Once the node has been discovered, content
associated with the discovered node is saved to a memory location in the computer
system. The content may include data associated with the discovered node and the
type of data associated with the node. The hierarchical data is then parsed again
to discover subsequent nodes until the content for all of the nodes has been saved
to the memory location. Then a schema generator generates schema elements using
complex rules based on the particular qualities of each discovered node for each
discovered node until a schema is generated for the hierarchical data.