Bayesian analysis is a filter that calculates the probability of a message being spam based on the history of its contents. Unlike simple content-based filters, Bayesian spam filtering learns from
spam and from good mail, resulting in a very robust, adapting and efficient anti-spam approach that, best of all, returns hardly
any false positives.
Bayesian analysis is an adaptive filter - meaning it learns and makes itself better as it encounters more emails. When you first
install Xeams, the Bayesian filter runs in learning mode. This is because it does not have enough data to decide if an email is
junk or good. In learning mode this filter relies on other filters to assign a score to the message and updates its database.
The Bayesian filter manages two databases:
History of spam word count
History of good word count
These databases are stored in the config folder in Xeams and the file names are SpamWords_001.dat and HamWords_001.dat
respectively. These are plain text files with two columns: Word count and the actual word.
To avoid extensive memory usage, the Bayesian filter stops learning once it reaches a certain word
count, called as Graduation. After reaching graduation point, the filter only learns when a user specifically marks a message good
Bayesian scoring works in both direction - meaning if the filter determines an email to be junk, a
positive score is assigned. A negative score is assigned if it is considered good.
Improving Bayesian Filter
When a message gets incorrectly tagged as good or junk, you should mark it appropriately in
the message repository. Marking a message updates the Bayesian database.
We recommend that you run the message through the Spam Simulator to verify if Bayesian analysis
will benefit from learning. There is no point is making the Bayesian filter learn from a message if it already is determining
the email's category correctly.