One embodiment of a non-word-based information retrieval system includes
searching stock or image documents in a huge data source. A
non-word-based document is first divided into a series of elements or an
array of cells. Each element or cell is matched against a series of
predefined token patterns, so that a match will generate a token having a
name. The collection of the generated named tokens is a word-based
representation of the non-word-based document. After tokens from all
documents are collected in a master collection of tokens, the
non-word-based documents can be efficiently and systematically searched
in a manner analogous to a document search in a word-based search system.