An academic approach to anti-spam


A recent article in the New Scientist entitled “To beat spam, turn its own weapons against it”, describes the work done by a team of academics to find a more effective way to filter spam.  The team, from ICSI Berkeley and UC San Diego, have come up with a way of analyzing the spam email messages sent by a ‘captured’ zombie PC.  After watching the zombie’s spam outpourings for about 10 minutes, they managed to reconstruct the underlying template used to create the numerous variations of a particular spam message. This allowed them to successfully instruct spam filters to watch out for messages that match the template.

Our CTO Amir Lev discussed the validity of the academics’ approach in his blog post where he writes, “I congratulate the team; in many ways, it’s similar to how our technology works. However, I’d like to suggest that the technique as described is going to be too simplistic for the real world. Ten minutes is far too long to derive the template: in ten minutes, a botnet can deliver millions of spam messages. The template can change quite frequently, too, rendering the work done to derive the template useless.

Spying on just one zombie at one location is a major limitation: you need a widely distributed system – millions of nodes all around the internet — in order to quickly capture sufficient breadth of data. And you need fast, automatic, efficient processing to collate all that information into spam signatures for filters to match against”.  Commtouch has an extensive network worldwide collecting these sorts of samples which are analyzed with our patented Recurrent Pattern Detection (RPD). With RPD we identify the template-driven features of any new spam campaigns in seconds, by examining billions of transactions from about a million different bots daily.

I decided to review the Internet archives (i.e.: Google) to see what other academic initiatives against spam have been shared.  A May 2005 article (also in the New Scientist) discusses a community rating approach to identify spam.  This has since been used with reasonable success by some anti-spam companies, but suffers from the same issue as the new approach, namely: “give us at least 10 minutes to deal with this spam outbreak”.  As described above, 10 minutes is just too long.

Further academic initiatives that I found generally related to suggested improvements for other known techniques.  These include better signature generation and use of more mathematically complex filters.  One system uses analogies to the workings of the human immune system (“take 2 aspirin and your spam will just disappear”).

Regardless of the validity of these approaches, it’s great that academia continues to consider spam a topic worthy of research and we welcome the open discussions and brainstorming that are promoted by such initiatives.

While writing this it also occurred to me that there must be a sizeable group of “academics” working for the “other side” – let’s call them “spamademics”.  Day and night the spamademics research ways to outwit the numerous technologies arrayed against them.  Now that’s research I would love to see…

Go back