Bayesian poisoning in web sites?


Here’s a new one provided to me yesterday by our detection center; I had to think about it for a day or so to try to figure out what they were doing. A spam outbreak contained messages driving to the following (slightly edited by me) pornography site:


Porn Site - top of the page

Looks pretty self-explanatory. But then, if you scroll down, you get a whole set of new content, most of which looks like legitimate search engine results, totally unrelated to pornography:

Pornography Site - the legitimate search links at the bottom of the page

So why are the spammers/pornographers doing this? I mulled it over in my own head and then consulted with a few people here at Commtouch. There were a few ideas, but here is the conjecture that got the most votes. Remember bayesian poisoning? That is legitimate-appearing text that appears at the bottom of spam messages, designed to fool Bayesian filters. By including actual legitimate text and hyperlinks at the bottom of this spam web page, the authors may have intended to fool some filters, perhaps those email filters that follow links within email messages to help determine if the messages are spam or malware.

Other ideas that were floated around were that these were links to enhance SEO (Search Engine Optimization), but none of the sites were listed in any of the search engines we checked (users are intended to get to them through links in spam messages), so SEO was ruled out as a possibility.

I’d be very interested to hear if someone has another idea – hopefully one that makes more sense than either of these.

