Methods, systems and machine-readable instructions for processing an
electronic document are described. In one aspect, logical blocks that
were extracted from the electronic document, including a text block
comprising text lines each encompassed by a respective bounding
rectangle, are received. Edges of ones of the bounding rectangles are
extended to at least one boundary without changing layout relationships
among the logical blocks in the electronic document. A text layout
boundary is generated from extended and unextended edges of the bounding
rectangles. A description of the text layout boundary is stored in a
machine-readable medium.