In a method for identifying a table of contents in a document (10), text
fragments are extracted (12) from the document. There are identified (20,
30, 34, 38): (i) a substantially contiguous group of text fragments as
table of content entries and (ii) a different group of text fragments as
linked text fragments linked with corresponding table of content entries.
During the identifying, a number of text fragments that are candidates
for identification as linked text fragments is reduced based on at least
one reduction criterion (130). The identified table of contents entries
and linked text fragments (110) are validated based on at least one
validation criterion (162) related to distribution of the linked text
fragments.