Product » A free email server for Windows and Linux » Features available in Xeams » Junk mail

Bayesian Analysis

Bayesian analysis is a filter that calculates the probability of a message being spam based on the history of its contents. Unlike simple content-based filters, Bayesian spam filtering learns from spam and from good mail, resulting in a very robust, adapting and efficient anti-spam approach that, best of all, returns hardly any false positives.

Bayesian analysis is an adaptive filter - meaning it learns and makes itself better as it encounters more emails. When you first install Xeams, the Bayesian filter runs in learning mode. This is because it does not have enough data to decide if an email is junk or good. In learning mode this filter relies on other filters to assign a score to the message and updates its database.

Bayesian Repositories

The Bayesian filter manages two databases:
  1. History of spam word count
  2. History of good word count
These databases are stored in the config folder in Xeams and the file names are SpamWords_001.dat and HamWords_001.dat respectively. These are plain text files with two columns: Word count and the actual word.

Bayesian Graduation

To avoid extensive memory usage, the Bayesian filter stops learning once it reaches a certain word count, called as Graduation. After reaching graduation point, the filter only learns when a user specifically marks a message good or spam.

Bayesian Scoring

Bayesian scoring works in both direction - meaning if the filter determines an email to be junk, a positive score is assigned. A negative score is assigned if it is considered good.

Improving Bayesian Filter

When a message gets incorrectly tagged as good or junk, you should mark it appropriately in the message repository. Marking a message updates the Bayesian database.
We recommend that you run the message through the Spam Simulator to verify if Bayesian analysis will benefit from learning. There is no point is making the Bayesian filter learn from a message if it already is determining the email's category correctly.