Path-based ranking of unvisited Web pages for WWW crawling is provided,
via identifying all the paths beginning with a "seed" URL and leading to
visited relevant web pages as "good-path set", and for each unvisited web
page, identifying the paths beginning from the "seed" URL leading to it
as "partial-path set"; classifying all the visited web pages and labeling
each web Page with the labels of a class or classes it belongs to;
training a statistic model for generalizing the common patterns among all
ones of "good-path set"; and evaluating the "partial-path set" with the
statistic model and ranking the unvisited web pages with the evaluation
results.