Incoming Spam

Quite a lot of work has been put into the Gmane anti-spam functionality over the years. This web page tries to give an overview of how it all works.

First of all, all incoming mail is run through the clamav virus scanner. If it says that something is a virus, then the mail is redirected to the special hidden group gmane.spam.virus. The clamav virus database is updated automatically every day.

All mail to Gmane arrives from mailing lists. So, the next check that's performed are some simple heuristics to determine whether it looks like the mail actually came from a mailing list server. If not, the mail is sent to a special, hidden group called gmane.junk, which is only visible to the Gmane administrators, who trundle through it now and then to see whether anything has been misclassified, and tweaks the conf file if it has.

If it looks like the mail came from a mailing list, it's passed to SpamAssassin, which performs a bunch of tests to see whether the mail looks like it's spam. In addition to the normal pattern checks, this includes RBL checks and statistical Bayes classification. If SpamAssassin says that the mail is spam, it's still posted to the recipient group, but it's also cross-posted to the special gmane.spam.detected group. These articles are ignored by the search engine and the web interface, but can be read via the news interface. People who don't like reading spam can kill based on the Xref header line.

In any case, articles that are posted to groups can be reported as spam (if SpamAssassin didn't detect it) or as ham (if SpamAssassin reported is as spam, and it's not). This can be only be done from the web interface (in the spam case) by clicking a link.

These reports are viewed by the Gmane administrators that accept them or reject them. Articles that have been handled by the administrators are forever tagged as being confirmed ham/spam, and can't be reported again. Spam reports that are accepted makes the messages in question disappear from the web interface immediately, and are cross-posted to the spam group on the news interface at a later date.

You can find more details about the approval/rejection rates here.

Finally, the accepted spam reports are fed back to the SpamAssassin statistical Bayes database, making it ever more accurate. So not only does reporting spam make the specific spam in question disappear from the archive, it also helps with identifying subsequent similar spams.

Spammers. Spammers. Spammers. It's enough to drive anyone insane. Ha. Ha.