A computerized method, and a corresponding apparatus, for segmentation of a
stream of text elements comprising analyzed tokens into one or more
initial clauses is disclosed. In the method, a predetermined number of
consecutive text elements of said stream of text elements are scanned,
starting from a given position. The predetermined number of consecutive
text elements are compared with each pattern of a set of patterns for
beginnings of initial clauses, and a beginning of an initial clause is
identified in the predetermined number of consecutive text elements, if
the predetermined number of consecutive text elements match one pattern of
the set of patterns for beginnings of initial clauses. The given position
is then moved at least one position forward and the scanning, comparison
and identification is repeated.