cornerhost wiki   FAQ/SpamAssassin UserPreferences
 
HelpContents FindPage Diffs Info Edit Subscribe XML Print View

  1. basic spamassassin configuration
  2. whitelisting / blackslisting
  3. How do I change the threshold level?
  4. SpamAssassin on secondary account
  5. Why isn't autolearning working?
  6. How can I train it?
  7. Real World Example
  8. OK, What the heck am I to do with the example?

basic spamassassin configuration

For spamassasin, create a file called ~/.procmailrc with the following lines:

:0fw
| spamc
## .. And if you want to filter the spam to another folder
## as opposed to just marking it as spam for your mail
## client to take care of, add these three lines:
:0:
* ^X-Spam-Status: Yes
mail/spam

Make sure to use ASCII mode if you upload the .procmailrc file with FTP. Otherwise your mailbox will get corrupted!!

whitelisting / blackslisting

SpamAssassin gets its configuration from a file (that you would have to create) called:

To always accept mail from a partuclar email address, add the following command to that file:

You can see the other possible spamassassin commands here:

There is a really cool SpamAssassin configuration file creator at:

How do I change the threshold level?

Put a required_hits line in your ~/.spamassassin/user_prefs file. From the docs:

required_hits n.nn (default: 5)

Set the number of hits required before a mail is considered spam. n.nn can be an integer or a real number. 5.0 is the default setting, and is quite aggressive; it would be suitable for a single-user setup, but if you're an ISP installing SpamAssassin, you should probably set the default to be more conservative, like 8.0 or 10.0. It is not recommended to automatically delete or discard messages marked as spam, as your users will complain, but if you choose to do so, only delete messages with an exceptionally high score such as 15.0 or higher.

SpamAssassin on secondary account

Logon to the secondary account using the secondary account's userid/password and follow the same steps as above to add a .procmailrc file and, optionally, the SpamAssassin directory/preferences file (~/.spamassassin/user_prefs).

Why isn't autolearning working?

See here: http://wiki.apache.org/spamassassin/AutolearningNotWorking

How can I train it?

So how to train it?

The basic tool is called sa-learn. You can run "sa-learn --ham" or "sa-learn --spam" on a mailbox.

So the basic idea looks like this:

Another possibility is to just forward the mail to spam@yourdomain or ham@yourdomain, and use procmail to train on these domains. There's an example here:

Real World Example

:0fw
| spamc
:0:
* ^X-Spam-Status: Yes
mail/SpamDetected


# How many hits before a message is considered spam.
required_hits           5.0

# Whether to change the subject of suspected spam
#rewrite_subject         1

# Text to prepend to subject if rewrite_subject is used
#rewrite_header Subject [SPAM]

# Encapsulate spam in an attachment
report_safe             0

# Use terse version of the spam report
use_terse_report        0

# Enable the Bayes system
use_bayes               1

# Enable Bayes auto-learning
auto_learn              1

# Enable or disable network checks
skip_rbl_checks         0
use_razor2              1
use_dcc                 1
use_pyzor               1

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_languages            all

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales              all

#!/bin/sh
sa-learn --spam --showdots --mbox --no-sync $HOME/mail/SpamToLearn
sa-learn --ham --showdots --mbox --no-sync $HOME/mail/HamToLearn
sa-learn --sync
cat /dev/null > $HOME/mail/SpamToLearn
cat /dev/null > $HOME/mail/HamToLearn

30 5 * * *  nice $HOME/learnspam

OK, What the heck am I to do with the example?

You need to

Spamassassin on the server will put spam that it detects in the SpamDetected folder. This will not download to your mail program via a POP connection. This is the first level of spam filtering and will, over time, catch 90% of your spam.

The filter your created in your MUA will catch the next 10% of your spam (that scores a 1.1 or higher) and put it in the SpamSuspects folder. However you need to inform the server that it missed some spam so that it becomes more accurate. Copy the spam messages from your Inbox or from SpamSuspects to the "SpamToLearn" folder on the server via the IMAP connection.

Any non-spam messages that are in the SpamDetected folder on the server should be moved to HamToLearn on the server so that the server can learn those messages.

At night, the "learnspam" script will run at 5:30 am and learn your spam and ham to become more accurate at identifying spam. It takes hundreds or thousands of messages, so if you have an existing set of spam messages, it would be good to copy them to SpamToLearn so that the server can be seeded with a good set of spam.

After running this setup for a month or so, I find that my ham scores between -3.0 and 1.0 and my spam scores between 2.0 and 15.0. After you get comfortable with the scoring of your spam and ham, you may want to adjust the required_hits setting in .spamassassin/user_prefs on the server.

You can verify that SpamAssassin is working by looking at the headers of emails you have received. SpamAssassin adds headers such as the following:

X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mercury.sabren.com
X-Spam-Level: 
X-Spam-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,NO_REAL_NAME  autolearn=no version=3.0.2
and
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mercury.sabren.com
X-Spam-Level: ******************
X-Spam-Status: Yes, score=18.4 required=5.0 tests=BAYES_60,DOMAIN_RATIO, DRUGS_ERECTILE,
   DRUG_DOSAGE,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS, HTML_90_100,HTML_IMAGE_ONLY_08,HTML_MESSAGE,
   INVALID_DATE, MIME_HTML_MOSTLY,MIME_QP_LONG_LINE,MPART_ALT_DIFF, MSGID_OUTLOOK_INVALID,
   NO_REAL_NAME,RAZOR2_CF_RANGE_51_100, RAZOR2_CHECK,
   SUBJECT_DRUG_GAP_VIA autolearn=spam version=3.0.2
X-Spam-Report: *  0.3 SUBJECT_DRUG_GAP_VIA Subject contains a gappy version of 'viagra' 
   *  0.0 NO_REAL_NAME From: does not include a real name 
   *  0.2 INVALID_DATE Invalid Date: header (not RFC 2822) 
   *  2.1 HEAD_ILLEGAL_CHARS Header contains too many raw illegal characters 
   *  2.7 MSGID_OUTLOOK_INVALID Message-Id is fake (in Outlook Express format) 
   *  2.7 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' headers 
   *  0.9 DRUG_DOSAGE BODY: Talks about price per dose 
   *  3.2 DOMAIN_RATIO BODY: Message body mentions many internet domains 
   *  0.4 BAYES_60 BODY: Bayesian spam probability is 60 to 80% *      [score: 0.6681] 
   *  1.0 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 
   *  0.0 HTML_MESSAGE BODY: HTML included in message 
   *  3.0 HTML_IMAGE_ONLY_08 BODY: HTML: images with 400-800 bytes of words 
   *  0.1 MPART_ALT_DIFF BODY: HTML and text parts are different 
   *  0.1 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence level above 50% 
   *      [cf: 100] *  0.0 HTML_90_100 BODY: Message is 90% to 100% HTML 
   *  0.0 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars 
   *  1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) 
   *  0.2 DRUGS_ERECTILE Refers to an erectile drug


PythonPowered