A hardware unit for parsing an XML document includes embedded logic or
circuitry for accessing the document, decoding it to change a character
set, validating individual characters of the document, extracting tokens,
maintaining a symbol table and generating binary token headers to
describe the document's structure and convey the document's data to an
application. Tokenization, the process of identifying tokens and
generating token headers, may be controlled by a finite state machine
that recognizes XML delimiters in the document's markup and activates
state transitions based on the current state and the recognized
delimiter. The parser unit may be implemented within a hardware XML
accelerator that includes a processor, a DMA engine, a cryptographic
engine, memory (e.g., for storing a document, maintaining a symbol table)
and various interfaces (e.g., network, memory, bus).