A crawling system crawls a web site initially in a pattern detection phase
and subsequently in a pattern usage phase. The pattern detection phase
attempts to identify patterns of references to pages that contain
informational content of interest and patterns of references to pages
that contain little informational content of interest. During the pattern
usage phase, the crawling system crawls the web site. When the crawling
system encounters a reference contained on an accessed page, the crawling
system determines whether the reference matches a reference pattern. If
the reference matches a reference pattern associated with pages that
contain informational content of interest, the crawling system accesses
the referenced page. If, however, the reference matches a reference
pattern of pages with little informational content, then the crawling
system discards that reference without accessing the referenced page.