A method for downloading HTML formatted Web pages is provided. The method
includes the steps of writing a URL of a Web page to be downloaded to an
XQuery script; analyzing the XQuery script to obtain the URL of the HTML
Web page and saving the downloaded Web page in a database as the local
Web page; analyzing the contents of the local Web page to obtain target
contents; converting the relative URLs of all image files to the absolute
URLs; downloading all the image files according to the absolute URLs;
replacing the absolute URLs of the image files with an local image file
path; converting the relative URLs of the embedded links to the absolute
URLs of the embedded links; saving all the converted absolute URLs in the
database, creating identifiers; replacing the converted absolute URLs of
the embedded links with an embedded link local path. A related system is
also disclosed.