Spamhaus’ ZEN blacklist efficiency

At work, I'm using Spamhaus' Zen blacklist for many years now. For a huge organization the amount of daily checks makes it impossible to rely on the free Spamhaus service. So we pay for a local copy of the blacklist, rsynched every 20 minutes. It allows faster check too. When you use Spamhaus' blacklists as a paid service, the question is: how can you rate your return on investment? In an attempt to answer this question, I've gathered 2 years and 9 months worth of mail server log files (12 GB bzip2) and extracted some data.

I use greylist, blacklist, whitelist, antispam/antivirus, and recipient-based filtering. So it make things quite complicated when I need clear statistics about what is going on. The MX server accepts around 1.3 million messages a month for internal delivery, but many more come knocking at the door.
The main purpose of blacklisting is to limit the amount of emails going thru expensive filters. Greylist and blacklist are cheap filters: they are fast, and they cost very few CPU, memory, and network resources. In comparison, antispam and antivirus filters are expensive: they are slow, and have a huge CPU, memory and network usage. I do before-queue content filtering. It means the MX server will scan for spam and virus before the email is accepted. So all the filtering process must take place during the SMTP session, and that's a pretty hard thing to do. To make sure that spam and virus filters are available for fast analysis, you must block as much as bad emails with cheap (and fast) filters.

So here is the deal: use of paid RBL (Spamhaus' Zen, here) is only relevant if expensive filters cannot cope with the traffic. Below, the gnuplot output for MX log files between 20100101 and 20120928. It shows in red the number of hits in zen RBL, in green the number of emails coming out of Amavisd-new as "clean", and in blue the number of spam blocked by Amavisd-new.

2 years and 9 months of data: daily spam count in blue ("Blocked Spam" in amavisd-new logs), daily clean count ("Passed") in green, daily blacklist hits in red (blocked using zen.dnsbl-local in Postfix logs).

It shows, mainly, that hits in the blacklist have plummeted, when levels of spam and clean emails out of Amavisd-new are fairly constant. So while the blacklist is blocking at least 5 times less incoming SMTP transactions, the amount of emails reaching the antispam does not change. Spamhaus' zen blacklist efficiency is good (no increase in spam detection), but becomes less useful every day.

Below, the gnuplot output for the same time period showing the number of lines in Postfix logs :

total number of lines in postfix logs, daily basis

total number of lines in postfix logs, daily basis

It's good enough to stand for the evolution of the number of SMTP sessions and it's very similar to the curve of blacklist hits. Then, we can conclude that an external factor is responsible for the drop in incoming unwanted SMTP transactions. The war on botnets really took off in 2010, so may be we have an explanation here.

Lets go back to the main idea of this post: does zen blacklist worth its yearly fee? Based on my experience, yes it does, or at least, it did. From 2008 to 2011, it was clearly an asset. Before-queue content filtering would have been absolutely unusable. Reducing dramatically the load on the antivirus and antispam, zen RBL allowed me to handle about 1,300,000 clean email messages per month on a single MX server with b-qcf (on a 6 years old Apple XServe).
On 2012, zen blacklist still blocks a good daily amount of spam before it reaches the real antispam. But it's clear that if the trend continues, I will not renew the Spamhaus subscription on september 2013. At this rate, usefulness of this blacklist will not worth its cost by the end of 2012. Now that big vendors like Microsoft have embraced the war on botnets, I'm pretty confident that I won't need zen RBL any longer.

À lire aussi :

2 comments

  1. Hi Patrick.

    We are a group of students, from at The University of Aalborg (Denmark). We are currently writing a report on spam on the internet, and we are looking into the efficiency DNSBL for the end-user. We find it that the data and statistics presented in this post, would be extremly viable for us to use in our report.
    We would like to ask for premission, to use the information and graphs in this post for our report, while ofcourse crediting you in the report.

    Best regards - Emil Kongsgaard

  2. Hi Emil,

    Feel free to borrow any info/graphs you might need.
    You'll find some files here https://www.patpro.net/filez/9z2s92u (10 days limited availability). Including raw data, and gnuplot script. Unfortunately one file is missing, but everything you need to recreate the missing file and generate new graphs (may be in PDF or EPS for the report) is available in the package. I've used gnuplot 4.6 patchlevel 0, but any other recent version should work.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *