Mail is incorrectly classified as SPAM

aschlager · October 17, 2018, 12:46pm

Hi,
I’m using this really fantastic UCS system in the latest version as a Mail server.
Now I’m facing some problems with false-positive SPAM mails.
The mails are classified as this:
X-Spam-Flag: NO
X-Spam-Score: 5.161
X-Spam-Level: *****

mail/antispam/requiredhits is set to 6.0 - so X-SPAM-Flag:NO is correct. The subject also is not labelled with “*** SPAM***” - as defined in mail/antispam/headertag

So far - so good.

But Dovecot everytime delivers this mails into the SPAM Folder. Furthermore, It seems, that putting those mails into the HAM folder has no effect - the X-Spam-Score is the same, even if I have now 30 mails from the same sender in my HAM folder.

So I assume 2 things:

spamassassin does not care about mails in the user’s “Ham” folder. I assume this, bcs. the Spam Score does not lower, regardeless how much similar mails I put into the “Ham” folder, and
Dovecot does not care about the X-Spam-Flag.

Any help highly welcome!

For completeness, here some relevant UCS registry settings:
mail/antispam/autostart = yes
mail/antispam/bodysizelimit = 512
mail/antispam/headertag = *** SPAM ***
mail/antispam/learndaily = yes
mail/antispam/requiredhits = 6.0
mail/antispam/rules/autoupdate = yes
mail/dovecot/folder/spam = Spam
mail/dovecot/mailboxes/special/Junk = \Junk
mail/dovecot/mailboxes/special/Spam = \Junk

BR.,
-Andreas.

Christian_Voelker · October 17, 2018, 1:34pm

Hi,

I have to state I am a little bit unsure about the details how all these components work together but you might get some hints regarding the issue.

First, usually dovecot does not “sort” mails into folders. If this happens, there might be some Sieve scripts activated. One of the usual suspects here is the following:
90-plugins.conf (dovecot)

plugin {
  #setting_name = value
  sieve = /etc/dovecot/sieve/default.sieve
}

default.sieve

require "fileinto";
if header :contains "X-Spam-Flag" "YES" {
    fileinto "Junk";
}

You might need to verify the pathes here…

Are there any configurations/ scripts as shown above? If no Sieve scripts are active I assume dovecot does not do the move into the Junk Folder. Instead it might be your client (Thunderbird?). Check this, Thunderbird has a filter log.

Second,
regarding the Ham/ Spam Training. Spamassassin does not automatically read the Junk or Ham folder. It has to be run (ie based on cron)- please check if there are such calls in any of the cron entries. Manually it can be done like this:

sa-learn --spam /home/user/Maildir/.Junk//cur

You need to give the path to the folder containing the spam. Same applies to ham. And as far as I know you need to enable bayes scanning additionally.

Check docs for details, as said just some hints here.

/CV

troeder · October 17, 2018, 3:10pm

Please check the sieve script in the users Maildir.
It was probably created when the UCRV mail/antispam/requiredhits was set to5.0, and it will not be automatically updated if the values changes.

aschlager · October 17, 2018, 3:17pm

Hi Christian,

many thanks for your fast reply!
Your guess regarding Dovecot was excellent

Thunderbird is not the one who’s moving the SPAM Mail into the Spam folder - I checked this. But there is a “default.sieve” in the user’s sive-directory, which automatically is generated from /var/lib/dovecot/sieve/default.sieve:

 # Warning: This file is auto-generated and might be overwritten by
 #          univention-config-registry.
 #          Please edit the following file(s) instead:
 # Warnung: Diese Datei wurde automatisch generiert und kann durch
 #          univention-config-registry ueberschrieben werden.
 #          Bitte bearbeiten Sie an Stelle dessen die folgende(n) Datei(en):
 # 
 # 	/etc/univention/templates/files/var/lib/dovecot/sieve/default.sieve
 # 
 
 # Univention Sieve Script - generated on Mon Sep 10 12:53:31 2018
 require ["fileinto", "mailbox"];
 
 # Spamfilter
  if header :contains "X-Spam-Level" "******"  {
 	fileinto :create "Spam";
 	stop;
 }

So Dovecot looks at the “X-Spam-Level” Flag. That’s easy to correct

Regarding Spamassassin: There is a default cron.daily entry “univention.spamassassin”:
/usr/sbin/univention-sa-learn >> /var/log/univention/spamassassin-learn.log

The logfile states messages like this:

Learned tokens from 0 message(s) (2 message(s) examined)
Learned tokens from 5 message(s) (75 message(s) examined)
Learned tokens from 0 message(s) (19 message(s) examined)

No information, which mailboxes were scanned for learning, nor which folders etc.
So it seems this is not enough…?? Any idea how to correctly setup Spam/Ham training?

Best regards!

-Andreas.

Moritz_Bunkus · October 19, 2018, 9:55am

Hey,

univention-sa-learn is simply a shell script. Taking a look at it reveals that mail in folders named Junk and Spam are considered to be spam whereas mail in folders named Ham are considered not to be spam.

Kind regards
mosu

aschlager · October 19, 2018, 1:21pm

Hi,

good point. I checked the script, and in my opinion this won’t work:

Here the two main commands, where it uses “sa-learn” to train spamassassin:

find /var/spool/dovecot/private/ \( -wholename \*/\*/Maildir/.Spam -o -wholename \*/\*/Maildir/.Junk \) \
	-exec $SA_LEARN --dbpath /var/lib/amavis/.spamassassin --spam {} \;

find /var/spool/dovecot/private/ -wholename \*/\*/Maildir/.Ham \
	-exec $SA_LEARN --dbpath /var/lib/amavis/.spamassassin --ham {} \;

The mistake in my opinion here is, that the folders it queries contains mails in Maildir-Format. Looking at the “.Ham” folder for one user, it contains this structure:

drwx--S--- 2 dovemail dovemail  4096 Okt 15 19:07 cur
-rw------- 1 dovemail dovemail   848 Sep 17 10:08 dovecot.index
-rw------- 1 dovemail dovemail 14600 Okt 16 22:19 dovecot.index.cache
-rw------- 1 dovemail dovemail 11056 Okt 15 19:08 dovecot.index.log
-rw------- 1 dovemail dovemail    10 Mai 31 21:13 dovecot-keywords
-rw------- 1 dovemail dovemail  1079 Okt 15 19:07 dovecot-uidlist
-rw------- 1 dovemail dovemail     0 Mär 17  2018 maildirfolder
drwx--S--- 2 dovemail dovemail  4096 Aug 12 22:28 new
drwx--S--- 2 dovemail dovemail  4096 Okt 15 19:07 tmp

So “sa-learn” should scan the “cur” and the “new” folder here, otherwise training has no effect.

I now adapted the script in this way:

...
for folder in cur new; do
	find /var/spool/dovecot/private/ \( -wholename \*/\*/Maildir/.Spam/$folder -o -wholename \*/\*/Maildir/.Junk/$folder \) \
		-exec $SA_LEARN --dbpath /var/lib/amavis/.spamassassin --spam {} \;

	find /var/spool/dovecot/private/ -wholename \*/\*/Maildir/.Ham/$folder \
		-exec $SA_LEARN --dbpath /var/lib/amavis/.spamassassin --ham {} \;
done
...

I provide some feedback, if this worked… Need a few mails received

BR.,
-Andreas.

Moritz_Bunkus · October 19, 2018, 1:57pm

Hey,

sa-learn works recursively through directories given on the command line:

[0 root@backup2 ~] mkdir tmp sa-db
[0 root@backup2 ~] cd tmp
[0 root@backup2 ~/tmp] mkdir subfolder
[0 root@backup2 ~/tmp] echo nonono > subfolder/file.txt
[0 root@backup2 ~/tmp] sa-learn --dbpath ~/sa-db --spam ~/tmp
Learned tokens from 1 message(s) (1 message(s) examined)

But you’re right insofar as it should process only those files in the cur and new sub-folders, not the additional housekeeping files used by Dovecot: dovecot.index, dovecot.index.cache etc.

You should file a bug report for this.

m.

nirjhar · October 25, 2021, 9:54am

I got a strange problem. I have two mail servers fro two domain. Using the same configuration one server is marking my spam and changing incoming headers but another server is not. How to debug the spamassassin. I am using pyzor with them.

Thanks in advance.