A date querying system processes free-form text in documents to identify
and locate some or all of the dates in the documents using extended
regular expression matching to capture various date formats. The system
packages a canonicalized format of each identified date to support
various types of queries such as, for example, specific date querying,
hierarchical date querying, range date querying, proximity queries
comprising a date and any keywords, and any combination of types of
queries. The system scans a document to identify the various format dates
occurring in the document, disambiguates the resulting occurrences of
dates, and canonicalizes the dates according to one or more predetermined
formats.