A method and system are provided for filtering harmful HTML content from
an electronic document. An application program interface (API) examines
the fundamental structure of the HTML content in the document. The HTML
content in the electronic document is parsed into HTML elements and
attributes by a tokenizer and compared to a content library by a filter
in the API. The filter removes unknown HTML content as well as known
content that is listed as harmful in the content library. After the
harmful HTML content has removed, a new document is encoded which
includes the remaining safe HTML content for viewing in a web browser.