Methods, apparatuses, and computer-readable media for detecting bulk
electronic messages using header similarity analysis. Bulk electronic
messages can be detected by parsing (115) header fields of an electronic
message; associating (120) at least one constituent unit with each header
field defining a set of constituent units for each header field;
ascertaining (230) a feature vector for each set of constituent units;
forming (240) a collection of feature vectors; and computing (250) an
inner product from a set of constituent units from an additional
electronic message and the collection of feature vectors from the initial
electronic message resulting in a measure of similarity between the
initial electronic message and the additional electronic message.