The present invention involves a system and method that facilitate
extracting data from messages for spam filtering. The extracted data can
be in the form of features, which can be employed in connection with
machine learning systems to build improved filters. Data associated with
origination information as well as other information embedded in the body
of the message that allows a recipient of the message to contact and/or
respond to the sender of the message can be extracted as features. The
features, or a subset thereof, can be normalized and/or deobfuscated
prior to being employed as features of the machine learning systems. The
(deobfuscated) features can be employed to populate a plurality of
feature lists that facilitate spam detection and prevention. Exemplary
features include an email address, an IP address, a URL, an embedded
image pointing to a URL, and/or portions thereof.