A dangling web page processing system ranks dangling web pages on the web.
The system ranks dangling web pages of high quality that cannot be
crawled by a crawler. In addition, the system adjusts ranks to penalize
dangling web pages that return errors when links on the dangling web
pages are crawled. By providing a rank for dangling web pages, the
present system allows the concentration of crawling resources on those
dangling web pages that have the highest rank in the uncrawled region.
The system operates locally to the dangling web pages, providing
efficient determination of ranks for the dangling web pages. The system
explicitly discriminates against web pages on the basis of whether they
point to penalty pages, i.e., pages that return an error when a link is
followed. By incorporating more fine-grained information such as this
into ranking, the system can improve the quality of individual search
results and better manage resources for crawling.