SpamAssassin - how to sharpen the filter?

22 posts / 0 new
Last post
#1 Wed, 01/04/2012 - 05:10
flameproof

SpamAssassin - how to sharpen the filter?

I get the general feeling that SpamAssassin really sux. I also use Gmail to catch some of the same mailboxes and Gmail is WAY WAY better in filtering spam.

The bayesian filter doesn't seem to work at all (yes, it is enabled). I get the same SPAM type (Rolex from Russia, using my own email as sender) since weeks, I always report it in usermin - and still it's not counted as SPAM. I would say the effectivity is about 50% (Gmail 90%)

I am thinking already of disabling SpamAssassin completely and have my mail run through Gmail.

I would really appreciate you suggestions on how to make it work better.

Wed, 01/04/2012 - 11:02
andreychek

Howdy,

Well, Google's spam filters are indeed better -- I'd expect to see less spam with email on Gmail than on a local server running SpamAssassin.

Now, for improving spam filtering -- you'd want to make sure that you were training SpamAssassin, by reporting spam to it. That's something you can do from Usermin by clicking the "Report Spam" button.

Another tool you may want to bring into the mix is Greylisting. Using Greylisting can really make a big difference in how much spam you're seeing.

You can enable that by going into Email Messages -> Email Greylisting.

-Eric

Wed, 01/04/2012 - 21:55
flameproof

I like to try! But it failed....

Installing the Postgrey package ..

Installing package(s) with command yum -y install postgrey ..

Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package postgrey.noarch 0:1.34-1.el5.rf set to be updated
--> Processing Dependency: perl(IO::Multiplex) for package: postgrey
--> Processing Dependency: perl(Parse::Syslog) for package: postgrey
--> Processing Dependency: perl(BerkeleyDB) for package: postgrey
--> Running transaction check
---> Package perl-BerkeleyDB.i386 0:0.43-1.el5.rf set to be updated
---> Package perl-IO-Multiplex.noarch 0:1.10-3.el5.vm set to be updated
---> Package perl-Parse-Syslog.noarch 0:1.10-1.el5.rf set to be updated
memory alloc (43132 bytes) returned NULL.

.. install failed!

Any hint what the reason may be?

Wed, 01/04/2012 - 22:20
andreychek

Well, I see two problems there.

One, it looks as if you have a third party software repository enabled -- that can cause problems, and we don't recommend it.

Second, it looks like you're running into memory errors... are you by chance using an OpenVZ-based VPS? Also, what does "free -m" show?

-Eric

Wed, 01/04/2012 - 22:46
flameproof
# yum repolist all
repo id              repo name                                 status
addons               CentOS-5 - Addons                         enabled
base                 CentOS-5 - Base                           enabled
c5-media             CentOS-5 - Media                          disabled
centosplus           CentOS-5 - Plus                           disabled
extras               CentOS-5 - Extras                         enabled
rpmforge             RHEL 5 - RPMforge.net - dag               enabled
rpmforge-extras      RHEL 5 - RPMforge.net - extras            disabled
rpmforge-testing     RHEL 5 - RPMforge.net - testing           disabled
updates              CentOS-5 - Updates                        enabled
virtualmin           Red Hat Enterprise 5 - i386 - Virtualmin  enabled
virtualmin-universal Virtualmin Distribution Neutral           enabled

Anything there I should remove?

I think my VPS runs Virtuozzo

# free -m
             total       used       free     shared    buffers     cached
Mem:           286        183        102          0          0          0
-/+ buffers/cache:        183        102
Swap:            0          0          0
Thu, 01/05/2012 - 07:29
andreychek

I would suggest disabling the RPMForge repository -- some of the packages within it can conflict with the ones provided by Virtualmin, which can cause some strange issues to come up.

However, I think the issue you're running into now is one of available RAM. You don't appear to have much of it :-)

One of the problems with OpenVZ/Virtuozzo is that you don't have any swap -- so it's really easy to run out of RAM, especially in your case where you only have 286MB total there.

I would suggest adding more RAM there.

However, you may also want to look at the low memory guide:

http://www.virtualmin.com/documentation/system/low-memory

Fri, 01/06/2012 - 01:57
flameproof

Does that look better?

# yum repolist all
repo id              repo name                                 status
addons               CentOS-5 - Addons                         enabled
base                 CentOS-5 - Base                           enabled
c5-media             CentOS-5 - Media                          disabled
centosplus           CentOS-5 - Plus                           disabled
extras               CentOS-5 - Extras                         enabled
updates              CentOS-5 - Updates                        enabled
virtualmin           Red Hat Enterprise 5 - i386 - Virtualmin  enabled
virtualmin-universal Virtualmin Distribution Neutral           enabled

I know, not much RAM there, but that's all my host gives me. I will read that link first before I come with more questions....

Fri, 01/06/2012 - 04:40
flameproof

...and as I remembered correctly, the VPS runs better after a reboot - greylisting is installed now!

So when I click the button I get:

Enabling Postgrey at boot time ..
.. already enabled
 
Starting Postgrey server ..
.. started OK
 
Configuring Postfix to use Postgrey ..
.. configured to use port

But when I go back....

Email Greylisting > [Enable Greylisting] > Greylisting is not currently fully enabled on your system.

How come?

//-------------------late update------------

Putty gave me an error when I try to start postgrey manually:

Starting postgrey: Module IO::Multiplex is required for Multiplex. at /usr/lib/perl5/vendor_perl/5.8.8/Net/Server/Multiplex.pm line 32.

Then I tried:

 yum install perl-IO-Multiplex

And everything is fine now!

Tue, 01/10/2012 - 03:43
flameproof

Postgrey seems quite effictive, or spammers take a holiday (which is a bit unlikely I guess).

I found this very usefull 'how to':

http://wiki.centos.org/HowTos/postgrey

Is there a way to move "postgreyreport" so somewhere were it it is easily accessible? I worry (just a little though) that some legitimate email gets rejected (I have zero proof, just a worry).

Thu, 01/12/2012 - 19:56
flameproof

Postgrey works great and I get way less SPAM. So far so good, but I still get one very persistent email:

Sender email is the mailboxes email

It said: "USER_IN_WHITELIST"

But when I go to webmin I see:

Your auto-whitelist file /home/mydomain/.spamassassin/auto-whitelist does not contain any entries. It will be populated by SpamAsssassin as mail is processed by the system.

Return-Path: <me.mydomain.com>
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
    xxxxx.myvps.com
X-Spam-Level: 
X-Spam-Status: No, score=-68.6 required=3.0 tests=BAYES_80,FSL_HELO_NON_FQDN_1,
    HELO_NO_DOMAIN,HS_INDEX_PARAM,HTML_IMAGE_ONLY_08,HTML_MESSAGE,
    HTML_SHORT_LINK_IMG_1,MIME_HTML_ONLY,RAZOR2_CF_RANGE_51_100,
    RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_BRBL_LASTEXT,RCVD_IN_PBL,
    RCVD_IN_PSBL,RCVD_IN_XBL,RDNS_NONE,T_SURBL_MULTI1,T_SURBL_MULTI2,
    T_SURBL_MULTI3,T_URIBL_BLACK_OVERLAP,URIBL_AB_SURBL,URIBL_BLACK,
    URIBL_DBL_SPAM,URIBL_JP_SURBL,URIBL_RHS_DOB,URIBL_SBL,URIBL_SC_SURBL,
    URIBL_WS_SURBL,USER_IN_WHITELIST autolearn=no version=3.3.1
X-Original-To: me.mydomain.com
Delivered-To: me.mydomain@xxxxx.myvps.com
Received: from azeem-409fde109 (unknown [182.178.165.144])
    by xxxxx.myvps.com (Postfix) with SMTP id 564BF565AFFA
    for <me.mydomain.com>; Thu, 12 Jan 2012 09:50:52 -0600 (CST)
Message-ID: <20120112075040.2856.qmail@azeem-409fde109>
To: <me.mydomain.com>
Subject: me.mydomain.com Rolex Today -19%
From: <me.mydomain.com>
MIME-Version: 1.0
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Date: Thu, 12 Jan 2012 09:50:52 -0600 (CST)
 
[body deleted]

In what white_list it could still be then?

Tue, 01/17/2012 - 14:28
andreychek

I don't know where the SpamAssassin auto-whitelist is stored -- but more recent versions of SpamAssassin don't actually use the auto-whitelist by default, probably due to issues such as what you're seeing.

You could always disable that feature, or modify the "USER_IN_WHITELIST" score to have a lower weight.

There's details on that feature here:

http://wiki.apache.org/spamassassin/AutoWhitelist

Wed, 01/18/2012 - 22:12
flameproof

Well, I black_listed myself and the problem is gone.

After about a week of using Postgrey I must say that the change is AMAZING. I would say that my result is now better then Gmail.

Thanks for including Postgrey in your software!

Thu, 08/07/2014 - 01:32
flameproof

Since some time again lots of SPAM getting through. SpamAssassin is working, Postgrey is working (spamd and postgrey in TOP).

One thing I wonder, how do I know that Postgrey is acually working? I don't see any mail headers mentioning Postgrey.

I don't get that, for a few month it was working extremely well, and suddenly the effect seems like zero.

Latest mail headers:

Return-Path: <WhosWho@one.jezreeloil.com>

X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on me@myhost.com

X-Spam-Level: **

X-Spam-Status: No, score=2.8 required=3.0 tests=BAYES_50,DEAR_SOMETHING, HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.1

X-Original-To: me@mydomain

Delivered-To: me@me@myhost.com

Received: from one.jezreeloil.com (one.jezreeloil.com [84.200.77.157]) by me@myhost.com (Postfix) with ESMTP id 172D1584000E for <me@mydomain>; Wed, 6 Aug 2014 22:52:30 -0500 (CDT)

Received: by one.jezreeloil.com id hsbthk0001gf for <me@mydomain>; Wed, 6 Aug 2014 23:54:33 -0400 (envelope-from <WhosWho@one.jezreeloil.com>)
MIME-Version: 1.0

Content-Type: multipart/alternative; boundary="8d17fcea76171b7"

From: "Whos Who" <WhosWho@one.jezreeloil.com>

To: me@mydomain

Subject: Congratulations! You're a 2014 Candidate for Who's Who! Confirm Now.

Message-ID: <0.0.0.886.1CFB1F34C9953E6.AC20D2@one.jezreeloil.com>

Date: Wed, 6 Aug 2014 23:54:33 -0400
Thu, 08/07/2014 - 10:16
andreychek

Howdy,

I don't believe Postgrey adds any email headers, but it does generate log messages in your email log. You should see at least one log message from Postgrey for each incoming email.

-Eric

Fri, 08/08/2014 - 20:40
flameproof

Is that /var/log/maillog ?

I didn't see anything there, so I did:

[root@vps]# /etc/init.d/postgrey restart
Stopping postgrey:                                         [  OK  ]
Starting postgrey:                                         [  OK  ]

// so it was ON I guess

// then /var/log/maillog had:

Aug  8 19:53:54 vps-323 postgrey[3703]: 2014/08/08-19:53:54 Server closing!
Aug  8 19:53:54 vps-323 postgrey[13909]: Process Backgrounded
Aug  8 19:53:54 vps-323 postgrey[13909]: 2014/08/08-19:53:54 postgrey (type Net::Server::Multiplex) starting! pid(13909)
Aug  8 19:53:54 vps-323 postgrey[13909]: Binding to UNIX socket file /var/spool/postfix/postgrey/socket using SOCK_STREAM 
Aug  8 19:53:54 vps-323 postgrey[13909]: Setting gid to "105 105"
Aug  8 19:53:54 vps-323 postgrey[13909]: Setting uid to "104"

For individual messages I see no Postgrey in the log. Anything I can test freom here?

Sun, 09/21/2014 - 08:17
rapidwebs

i went over this with somebody in an earlier post. let me see if i can find it. note, however, that you will need to eventually do some hand tweaking of certain aspects of spamassassins different filters. you will also need to find something that works comfortably for you. its some what diffcult to use a cookie cutter solution in this scenario.

anyways, if i find that link, ill reply with the URL

Sun, 09/21/2014 - 10:08
rapidwebs

hmm. cant seem to find the conversation i had anywhere.

i apologize the novel im about the write, but i plan to use this as an example in the future. so hopefully it helps. but dont take offence if i am going over things you already understand...and if i start explaining them in an over simplified manner.

i expect that have a good understanding of these things, but i'd feel better if if wrote this so anybody coming along that finds it.. can read it too.

then again, i have a tendency to confuse people. so who knows :P

1) what are you trying to acheive? what is your goal?

i.e. are you hosting personal email for yourself? or is this a high value customer?

how computer savvy are the email users? what is the environment that this email server will be used in?

how much security do you require? how much time do you have to spend on this?

why these things matter is for two main reasons. one, is because you need to decide just how much spam is acceptable. not that you want any, but the filter requires tuning. and you don't want to deny legitimate mail during the tuning phase. the second reason is because alot of people who use email, should probably not even be using a computer ;)

more simply put.. and to the last question: how much security do they need?

once you answer these questions, you can define what needs to be done. and start.

note: this sounds like a headache.. but once its DONE, and done RIGHT, you will appreciate it. and it will just work. there is nothing worse than a customer who keeps getting his highly important email destroyed by the spam filter. THIS can be a much bigger headache, as i am sure you are aware of :)

2) make sure you enable every plugin available from spamassasin.

the spam filter comes with alot of features, but some arent enabled out the box on some systems. simply head over to /etc/spamassassin/ (on most systems) and go through these files. on debain/ubuntu, the ones in question are titled v310.pre through v330.pre

you will likely just need to remove the comments from the lines which contain text like this:

"loadplugin Mail::SpamAssassin::Plugin::DCC"

make sure you restart and make sure all these plugins are actually functional on your system before assuming they are operational, however

3) make use of clamAV.

more specifically, clamAV can do alot more than just check for viruses. for example, on ubuntu... there is a package called "clamav-unofficial-sigs". this might take a little bit of configuration (see its respective Man page..)... .however it includes defenitions to detect many more types of viruses... phishing.. some types of spam.. and alot of types of junk that plague email inboxes daily.

4) take a very good look at the spamassasin confiuration (/etc/spamassassin/local.cf)

there is some things that are great for helping deter spam with spamassasin.. but alot of people just enable the plugin... and assume all the work is over with

however, in reality, alot of the plugins require configuration. some is present in the existing config... some is present but commented out.. and some is missing in its entirely. i will attach my configuration to this post, as a sort of starting point

some things to make note of:

a) simple switches to turn on some plugins, such as

use_bayes 1

use_pyzor 1

use_razor2 1

skip_rbl_checks 0

b) shortcut blocks and whitelists are your friends. these are not exactly for helping with the spam problem, but can help reduce the time spamassasin spends working on legitimate mail (freeing resources on your server quicker).

c) the URI Black List plugin

this is incredibly useful, and helps quite a bit with spam. but i dont think its actually configured by default. its like a DNSBL, but for URLS. suspicious of malicious URLs found in the email body content can add to the spam score, helping identify spam. "very bad" urls with links to malware can be used to trigger spamassasin to destroy the message entirely, before somebody less savvy infects the network by accident

d) the DNS white list. again, not exactly for finding spam... but we need to seperate the real stuff from the spam, in order to properly train out filter.

attributed score decrease, if the sender exists in the DNS White list. example:

score RCVD_IN_DNSWL_LOW -1.750

score RCVD_IN_DNSWL_MED -2.000

score RCVD_IN_DNSWL_HI -2.500

or... it does not exist in the white this. neither good, or bad. so lets add a very small increase:

score RCVD_IN_DNSWL_NONE 0.500

e) when you get the time, and after you answer the question "just how much spam is acceptable", realize the fact that we are going to have to fine tune things

the stock configuration is just what it sounds like. a stock configuration. its difficult to coerse this to do what you want.

generally what I do, as you will see in my example configuration.. is i actually go through and fine tune the scores for much of the plugins and processes in spamassassin. it can be a few hours of tedious work, but once its done... you will have a much better idea of just what is going on during the spam classification phase.

remember to choose a threshold for bottom of the barrel, throw away spam. i like to set this to around 10. some people prefer 5. it really doesnt matter, because we are going to be fine tuning the scoring system to suit our needs anyways. i'm fairly certain this is set in virtualmin, and is a function of procmail. so look for it in your server templates, or somewhere else in virtualmin. i might also suggest NOT to delete "virus" mail, and instead store it in a virus directory (if you are going to beef up your clamav signatures). the reason being is that now, clamav is doing more than just classifying viruses.

once we have this threshold level chosen... we can start the tuning process.

for example, some things like.. email that don't pass SPF record checks.. in this day and age... is likely garbage. and if its not garbage, it should be. same goes for DKIM records. having no record is fine... but if the record is bad, or broken, than its likely junk. things like this deserve an obvious high score set, from the get go.

i think the trick is have all the obvious signs of junk mail add up to a score JUST BELOW your "throw away" spam level score (we talked about earlier). this way, spamassassin can work its magic and push it over upper limit. we arent going to tune everything... we are just going to tune the things which point to the email being obvious spam

another thing is to take into considering is adjusting the score values for the Bayes auto classifer. anything that has a 99% probability of being spam, should have a fairly high score set. we can than stagger this down to something that has a 10% chance. this low, i actually deduct point. why? if bayes gives it a 10% chance of being spam... id say that is a fairly low chance. and likely, it is just splitting hairs. we dont EVER want to classify legitimate mail as spam, or we are going to have to wipe the auto learning database. so in my opinion.. better safe than sorry. if there are any other obvious tell tale signs of the email being spam, out other rules should get it.

you will be able to see how i went about all these score increases and decreases in my example configuration (which i will attach to this post). again, i'm fairly certain that I have used an upper limit throw away spam score of about 10 in these example.

f) take advantage of other great tools available.

i find that its handy to have a few main back end servers, but with several queue and relay servers scattered through the cloud. these servers come equipped with basic DNSBL look up functionality... and "grey listing". virtualmin uses the greylisting plugin or program for rate limiting mail... but with a little bit of effort, you can repurpose this... or rather, use it for its intended purpose.. and perform greylisting on your perimiter mail servers. greylisting basically sends things that might be spam back to the sender, and asks them to deliver it in.. say... 15 minutes. this trips up most spam bots and the junk mail never returns.

g) monitor, clean, and organize your autowhite list.

the "auto white list" is a source of many problems for people. it doesnt just white list things.. but rather, applys a score to any mail from a specific sender. in effect, it can techically be a white list, and a black list. this also operates using an equation. the equation looks like this

[finalscore = score + (mean - score) * auto_whitelist_factor]

and auto_whitelist_factor can be set by you. mine is typically set to 0.5

make sure it hasnt sucked up legitimate addresses, and is applying them spam level scores. and make sure these white listed addresses get a negative score. you would be surprised how often this can get turned around by accident.

and at the same time, make sure anything that should be spam is getting a positive score. not a negative. some time spam will leak through no matter what you do, and if this is happening, double check it didnt make its way into your users auto white list. some times its easiest to just clear this every so often.

IN CONCLUSION:

it can take a while, and have alot of things to take in to consideration... but once its done, and done right.... spamassasin will be your friend. most mail servers i setup.. take a few weeks of monitoring after initial launch. but after this trial period, i almost never have a serious problem creep up on me in the future.

thanks for taking the time to read this! and i hope it helped to some degree. like i said before.. im going to point to this as an example in the future, and so if i went over things you already know or understand. just ignore it!

and remember: a poorly configured spam filter is probably worse than having no spam filter at all

i hope you get it working nicely for your self.

my final thoughts would have to be.. as with any complicated piece of software, and any diverse system... it is only as good as your design it to be. spamassasin is not "crap", its only "crap" if it was setup like "crap" :P. and thats understandable... it isn't exactly straightforward.

spamassasin is a spam filter took kit per say. it really doesnt work all that great out of the box. but nor does any other industry standard spam filter out there (atleast that i am aware of). and the same can be said about alot of applications and services, when it comes to web hosting.

any serious shared host will be sure to carefully examine and benchmark critical parts of their infrastructure. a spam filter can be very handy, for sure. but it can also be a nightmare (if it isn't used properly).

it can literally mean the difference between getting that reply for an employer. getting an important email from mum. getting an emergency message from your staff. and so much more. in any production environment, one must take careful consideration when it comes to this part of their infrastructure.

anyways, an example configuration for spamassasin has been attached. dont just copy and paste it.. but rather, use it as a starting point, or for reference.

and again.. i hope this was helpful!

take care!

Thu, 10/09/2014 - 23:16
flameproof

@rapidwebs Thanks for the reply. I got some good ideas from your long post. My local.cf was basically the vanilla one.

To better check what is working I removed all SPAM deleting for now. I have some annoying SPAM that use newly created EU, ME, LINK TLD domains, usually just a day old and used just once. I added:

uri      EU_TLD  /\.eu(?::\d+)?(?:\/|$)/i
describe EU_TLD  Contains an URL in the EU top-level domain
score    EU_TLD  5.0

uri      ME_TLD  /\.me(?::\d+)?(?:\/|$)/i
describe ME_TLD  Contains an URL in the ME top-level domain
score    ME_TLD  5.0

uri      LINK_TLD  /\.link(?::\d+)?(?:\/|$)/i
describe LINK_TLD  Contains an URL in the LINK top-level domain
score    LINK_TLD  5.0

...and a few more things I don't want to post because I am not fully happy yet.

Fri, 10/10/2014 - 20:01
rapidwebs

great news! eventually once you get the hang of it, it's not so bad really.

and sorry about the eye sore of a post ^_^

I would have liked to just link you to the one I wrote before, but I wasn't able to find it.

and I figured that in the future, somebody is going to need this information again.

Ill have to clean up the post some time though, and maybe add it to a wiki somewhere or something. does virtualmin have a wiki? ill have to take a look

regards, Steven

Sun, 11/16/2014 - 19:51
flameproof

@rapidwebs Strange thing, but my SPAMproblem is gone (at least for now). I moved to a new VPS and after a fresh install the SPAM system works.

Old Server: 286Mb - SpamAssassin version 3.3.1 - spamassassin (Standalone program) New Server: 1Gb - SpamAssassin version 3.3.1 - spamc (Client for SpamAssassin filter server spamd)

With the new server setting up SPAM protection wasn't my highest priority, so my local.cf is:

required_hits 5
report_safe 0
rewrite_header subject [SPAM _HITS_]

required_score 3

Note: [SPAM HITS] will include the SPAM level in the header, can be quite useful

Since it's really working very well now I will just keep it like that for now.

BTW, the new VPS is almost 4 times the memory, and nearly half the price. Runs totally smooth and fast (CentOS 6.6/86), KVM virtualisation.

Mon, 11/17/2014 - 15:19 (Reply to #20)
Joe
Joe's picture

I'd be willing to bet that the reason things weren't working on the old system was that spinning up the spamassassin processor was failing, due to memory allocation errors. The standalone program would use no memory when no processing mail, but it would have to allocate a new (rather large) block of memory whenever mail was received. Given that the system was running out of memory when simply installing packages, I'm sure the big SpamAssassin process would occasionally (maybe always) fail, too, and for the same reason.

Processing mail is the most memory-intensive function on many Virtualmin systems. SpamAssassin is quite large, and ClamAV is huge. You need about 130MB free (I think, though it's been a while since I did the math) in order to start both. On a 286MB system with all the other services running, that may just not be possible.

--

Check out the forum guidelines!

Mon, 11/17/2014 - 22:58
flameproof

Joe my old VPS was stalling quite often. That could have been the reason. I wish to have known that a few years ago.

Anyway, really nice to see virtualmin run on a fast system.

Topic locked