A focused random walk system produces samples of on-topic pages from a
collection of hyper-linked pages such as Web pages. The focused random
walk system utilizes a focused random walk to produce a focused sample,
which is a random sample of Web pages focused on a topic. The focused
random walk system uniformly samples pages iteratively, where each
iteration follows a random link from a union of the in-links and
out-links of a page. The system then classifies this randomly selected
link to determine whether the page is on-topic. The random walk sampling
process could comprise a hard-focus method that selects only on-topic
pages at each step of the focused random walk, or a soft-focus method
that allows limited divergence to off-topic pages.