A data collection and retrieval system and associated method allow the capture
and replication of data presented at various Web pages into a database application
through text parsing of the HTML source code of that document. The system allows
the user to select one or more Web pages containing data of interest; to specify
exactly which data within any page is to be captured; to specify how frequently
data is to be collected; and to specify the conditions for collection and retrieval.
The advantage of the system is realized through efficient, automated data collection
that would otherwise be impractical. The system includes an initialization stage
and an automatic execution stage. The initialization stage provides the user interface
which allows the user to select the source file that contains data the user wishes
to copy, target database that will receive the data, and timing criteria for automatic
transfer of data. The automatic execution stage automatically transfers the data
from the source file to the target database as instructed by the user in the initialization stage.