A system for tokenizing a document, such as, for example, an XML document.
A classifier is configured to assign the at least one character to at
least one of a plurality of character classes. Each of a plurality of
token logic units is configured to concurrently perform a comparison as
specified by an instruction. A comparison may comprise comparing the at
least one character class to an operand. An execution unit is configured
to select an action from the instruction in response to performing the
comparisons and to perform the action. A method of tokenizing a document
includes assigning at least one character from a document to at least one
of a plurality of character classes and concurrently performing a
plurality of comparisons. At least one of the plurality of comparisons
comprises comparing the assigned character class to the character from
the document. At least one action to be performed is selected based on at
least one result produced by performing the comparisons, and the selected
action is subsequently performed.