Sourceforge.net is one of the most well-respected technology download sites on the Internet, as evidenced by its Google page rank of 9 (out of possible 10), and the fact that it is ranked among the top 200 sites according to Alexa. However, recently Sourceforge became a lesson in the perils of user generated content (or UGC) on the Web.
One of Sourceforge’s subdomains is a wiki that allows users to add their own relevant content. Apparently some spammers saw this as an opportunity to do some search engine bombing. They filled up pages of the wiki with pornographic keywords, with links to their pornography pages (see screenshot below). The keywords and links are designed to leverage a highly ranked site (e.g. Sourceforge) to provide inbound links to their pornography site, causing it to rank highly in search engines as well.
Incidentally, the content placed on the Sourceforge wiki was “just” search engine spamming, however it could easily have been links to malware or other threats.
There are a few Web security lessons we can draw from this:
- When clicking on a link in a UGC site, you never know what you’re going to get. It could be fine, it could be pornography, or even malware. A high quality Web security solution should be able to identify malicious and inappropriate content even if it’s buried deep in a UGC site.
- It’s not enough to rely on the domain’s category to determine if you should block or allow access to a particular page. In this example, the parent domain is categorized as computers. Even the subdomain of the wiki is OK from the standpoint of appropriateness, so it’s not enough to go by the subdomain, either. A top-notch Web security solution will analyze the full path of the URL to determine if the content needs to be blocked or allowed.
Traditional URL filtering solutions that are based on local databases have difficulty dealing with this type of UGC or Web 2.0 content since they have limited space that would run out very quickly trying to store such deep information about each site. On the other hand, newer solutions based on a datacloud like Commtouch’s GlobalView URL Filtering, have infinite space in the cloud to analyze and store parent and child URLs, along with domains and subdomains. Each end-user organization receives just those URLs categorized that it needs, tailored to the browsing habits of its users, rather than stuffing the local database with irrelevant URLs.