Phrases in a corpus of documents including stopwords are found using a
data processor arranged to execute phrase queries. Memory stores an index
structure which maps entries in the index structure to documents in the
corpus. Entries in the index structure represent words and other entries
represent stopwords found in the corpus coalesced with prefixes of
respective adjacent words adjacent to the stopwords. The prefixes
comprise one or more leading characters of the respective adjacent words.
A query processor forms a modified query by substituting a stopword with
a search token representing the stopword coalesced with a prefix of the
next word in the query. The processor executes the modified query. Also,
index structures including coalesced stopwords are created and
maintained.