Information Infrastructure @ ANU The Australian National University

IIS HOME • IIS SEARCH • Division of Information

Spam Filtering


 
Spam filtering is a black art and there is unfortunately no easy way to detect spam. In the past we have run an access control list which allowed us to block known "bad" email addresses or domains from which the junk mail originated.  This method still works to a limited extent however spammers have become more sophisticated over time and use a number of ways to overcome these simple blocks.
 
With the proliferation of ISPs and free email accounts many spammers simply apply for one of these accounts and use it for as long as they can. Often within hours of posting out their junk email the ISP will be alerted and will close the account. Consequently by the time an offending email address is reported to us to be blocked the damage has already been done and the email address is no longer valid. Adding it to our access control list as a blocked site is then a waste of time.

The more usual practice which is now occurring is that the spammer will forge the email address and even the machine from which they are coming. This explains why you can now receive junk mail from what appears to be legitimate sites or even from yourself!  If we added these addresses to our access control list we would be blocking legitimate email.

So what can be done?

The system we have implemented on the ANU email gateway machines is a content filtering program MIMEDefang.   The software open each mail and performs a number of checks for  Viruses, Suspicious Characters, Dangerous Attachments and finally spam.

The spam filtering is done by looking at the content of the message and scoring it against a set of rules.  Each rule is worth a number of points and at the end of scanning the message, the system simply adds up the number of points which a particular message has scored.  The more points that a message scores the more likely it is going to be spam. 

Not all rules are weighted equally and there are positive and negative scoring rules. For instance in  many junk emails you might see the phrase FREE!!!  so this will score 1.2 points.  However this could also be part of a legitimate email so other key phrases would need to trigger rules before it was marked as spam.

How many points does it need to make it spam?

The standard answer to this is "How long is a piece of string?" In other words there is no clear cut boundary between spam and non spam. In our testing of the service for since September 2002 we have not detected any legitimate mail which scored more than 15 points.  From experience  most spam will score 4.5 or greater and most legitimate mail will not score above 8 points.   Based on this testing the ANU mail gateways will now block any mail which scores greater than 15 points and will refuse to accept the mail for delivery.  Even with the ANU blocking at 15 it still leaves a lot of mail which could be spam coming into your mail box.

Why choose a score so high?

The reason for choosing such a high value is that at 15 we know that it's >99.9% likely to be spam. Blocking any lower may mean that we could be blocking legitimate email messages for users and we wanted to avoid this it at all possible. Instead of the system providing the definite blocking mechanism we are tagging all mail by adding some additional headers to allow users to determine where the block should occur.

Header Tag
Interpretation
X-Spam-Status:
 This will tell you if the message has been scanned or not.  
X-Spam-Score:
 This field will contain a number of asterisks followed by the actual spam score in brackets
X-Spam-Tests:
 This is a list of tests which were matched by the system resulting in the total score
X-Scanned-By:
This field simply gives the version of the software which was used.

Example:

X-Spam-Status: Scanned
X-Spam-Score: * (8.3)
X-Spam-Tests: NO_REAL_NAME,REMOVE_SUBJ,EXCUSE_7,EXCUSE_3,SUPERLONG_LINE,MSG_ID_ADDED_BY_MTA_2
X-Scanned-By: MIMEDefang 2.15 (www dot roaringpenguin dot com slash mimedefang)


Each asterisk currently represents 5 points and allows an easy way for users to create simple filtering rules in their mail client.

 How do I filter my mail?

As previously mentioned SPAM filtering is not an exact science and you will receive mail marked as spam which shouldn't be.  For this reason we DO NOT recommend that you automatically delete mail. Instead move any mail marked as SPAM  out of you mail folder and into a separate folder which you should review before deleting. The following is a list of the recipes which users can use to configure their mail client to using the spam score. This is not an exhaustive list and will add to the list over time.


But it's not SPAM!

If you continue to receive mail from a particular address which is being marked as SPAM we can "whitelist" this address.  The whitelist is a list of trusted email address which do not have their content scanned for spam.   To have an email address  added to this list please email the address to postmaster@anu.edu.au along with details of why it should be added to the whitelist.  Email address submitted will be reviewed before being accepted and added to the whitelist.



Please direct enquiries regarding this page to the Webmaster.
The information on this page was updated on Tue, 15 Apr 2003. The page has been authorised by the Director, IIS as relevant officer.
© 2000 The Australian National University