A method (100) of crawling the Web (620) is disclosed. The method (100)
crawls (120) Web pages on the Web starting from a given (110) set of seed
Universal Resource Locators (URLs). Crawled Web pages are partitioned
(140) into sets of relevant and irrelevant pages. A set of exclusion
and/or inclusion patterns are discovered (150) from the sets of relevant
and irrelevant pages, and subsequent crawling of the Web is restricted
through the set of exclusion and/or inclusion patterns.