Evaluating URL Filtering Solutions: A step-by-step guide


There are a variety of URL filtering solutions out there. As a result, you find yourself evaluating these solutions both on their business and technology infrastructures. In our experience of providing URL filtering solutions, we’ve came across all sorts of methodologies to evaluate the underlying technology. Since there is no standard to test these solutions, we want to present the main points for you to consider when approaching such a task: coverage, accuracy and performance.


Coverage is defined as the ratio of URLs for which a solution provides a category to the total number of URLs tested. It is important to test each solution for coverage, since no URL Filtering solution can include every Internet web page.

Two key elements are intertwined with coverage tests:

  • Relevancy – the test must use a set of URLs that are relevant/important to your needs.
  • Coverage over time – the test should evaluate how the coverage adjusts itself quickly to your changing needs. That is, how quickly are URLs that are listed as unknown at T0 categorized at T1?

Here are a few tips when planning a coverage test:

  • When building a URL corpus, try to pool from popular sites.
  • Keep in mind that many popularly ranked sites provide only domains, and it is important to test a balance of domains and full path URLs.
  • Before running the test, eliminate duplicates and test for broken or unreachable URLs.
  • When running the test, count the number of “Known” and “Unknown” URLs in the results and calculate percentages.
  • In order to estimate coverage over time, run the test again with the same corpus after 12 hours, 1 day and 1 week.


Accuracy is tested by comparing the categories provided by the tested solutions with known, manually qualified categories for a corpus of URLs.  It is important to use recently categorized URL, as changes to Web sites may bias test results. Categorization should be done by a professional expert for accurate results.

In this section it is recommended to split into two subsections:

  • Regular Categories – Can be the same URLs that you used for the coverage test.
  • Zero-Hour Threat Categories – Sites including security and child pornography categories. These categories are highly time-sensitive, thus requiring a different approach.

Here are a few tips you need to consider when planning an accuracy test:

  • Keep in mind that since different vendors use various category names, and since some categories are “close enough” to others, you must define ahead of time which categories will be accepted as a correct match (e.g. an article describing the behavior of the stock exchange could be correctly categorized by any of several categories including Business, Finance, News etc.).
  • Due to the nature of Zero-Hour Threat Categories like phishing, malware, child pornography, for which the URLs are typically short-lived, these tests must be based on up-to-date URL lists.
  • One way to build a real-time list is to use the URLs found in email quarantine. It is recommended that you build a corpus of URLs drawn from quarantined messages no more than two hours old.
  • When you use a list of child pornography URLs, it is extremely important to be aware of the legal compliance issues surrounding these URLs; in some countries it is illegal even to possess such a list. Because of this, you are advised to seek professional help or third-party data to obtain this list.


Performance tests evaluate various parameters which most affect user experience. Providing good detection but degrading user experience or requiring large amounts of resources could be a key decision factor.

Here are a few tips you need to consider when planning a performance test:

  • Although performance is the hardest and longest part to evaluate, shortcuts may cause biased results.
  • Running systems over long periods of time (at least a few days) sometimes allows you to identify peak resource usage and the potential risk to the overall system. For example, some database systems receive periodic updates which may result in a system freeze or loss of network connectivity at these times.
  • It is important to test a scenario that closely resembles real-world usage. It is recommended that you use the solutions evaluated in an operative network with real traffic. Alternatively, collect real traffic from sample users (ISP, enterprise or others) through network sniffers and run them through the evaluated solutions. If you do this though, be sure to choose the network which most resembles your users.
  • Try to use a small utility that calculates the time it takes for a solution to return a URL classification. The utility should calculate the average latency time at the end of the test as well. Make sure the utility uses only memory — without writing to the disk — so it does not add unnecessary latency of its own.
  • It is also important to install a sniffer assigned to the port that the solution is using. The sniffer should count the bytes used on that port and a script/process should log memory, CPU and disk usage plus cumulative bandwidth consumption every minute from the start of the test until the end.


This is just a basic outline for testing URL filtering solutions and we recommend that you adjust the parameters to fit your organization. For more information about the evaluation process or how to customize the process for your needs, please contact bizdev@commtouch.com. For more information about Commtouch GlobalView URL Filtering solution, visit our Web site.

Go back