Words having selected characteristics in a corpus of documents are found
using a data processor arranged to execute queries. Memory stores an
index structure in which entries in the index structure map words and
marks for words having the selected characteristics to locations within
documents in the corpus. Entries in the index structure represent words
and other entries represent marks with the location information of a
marked word. The entries for the marks can be tokens coalesced with
prefixes of respective marked words or adjacent. A query processor forms
a modified query by adding a mark for a word to the query. The processor
executes the modified query.