A system and method of compression indexing and efficient proximity search
of text data permits high speed search featuring ranking the relevance of
search results according to closeness of desired terms within each
portion of text found. The system includes (a) preparing target text, (b)
creating a "compression index ebook", (c) browsing in a compression index
ebook, and (d) searching in a compression index ebook. To create the
compression index, the method includes the steps of selecting target
text, identifying tokens, such as words and punctuation strings, wherein
each of the tokens has a frequency. The frequencies of each token are
counted. Tokens are ranked from highest frequency to lowest frequency.
The frequencies are compressed. The next step is assigning positions to
each token frequency and compressing the positions to form a compression
index ebook, which is stored in random access memory to eliminate disk
seeks during browsing and searching.