Exploiting Transport-Level Characteristics of Spam

Robert Beverly and Karen Sollins.
Proceedings of the Fifth Conference on Email and Anti-Spam (CEAS 2008),
Mountain View, CA, August 2008.

We present a spam detection technique that relies on neither content nor reputation analysis. Instead, this work investigates the discriminatory power of the email TCP packet stream. From a corpus of packet flows and their corresponding messages, we extract per-email \emph{transport-layer} features. While legitimate mail traffic is well-behaved, we observe small congestion windows, retransmissions, loss and large latencies in spam flows. To identify the most selective flow properties, thereby adapting to different networks and users, we build ``SpamFlow.'' On our data, SpamFlow achieves greater than 90\% classification accuracy while correctly identifying 78\% of the false negatives from a popular content filter. By capitalizing on spam's fundamental requirement to source large quantities of mail, often from resource constrained hosts and networks, SpamFlow promises a unique and difficult-to-subvert complement to existing spam defenses.

[Postscript(547KB)] [PDF(176KB)] [BibTeX]
[Tech Report]
[Presentation Slides]

[ Return to publications ]