Gmane

Details of how posting works through Gmane

The first thing you have to do to understand how Gmane does news-to-mail is to forget all you know about how news gateways work. Gmane doesn't work the way you think.

When you post a message via NNTP, the news server (inn) takes the message, does the usual inn syntax checks (empty body, validish headers, etc). It then runs a Perl script written by moi which does rate limiting and address validation.

Phase One: Rate Limiting

Based on the IP address of the host that connects to the news server, it keeps track of how fast that hosts posts messages. If it reaches a specific limit, the news server starts rejecting messages with the message "You post too much; try again later". The limit in question may change over time, but it's just a limit to avoid having aggressive mass-posting spammers overloading the MTA that does further processing of the messages. It's not an anti-spam measure per se.

Phase Two: Address Validation

Let's say I post with the address "lala@gnus.org". The script first checks whether there's any MX records for that domain:

[larsi@quimbies ~]$ host -t mx gnus.org
gnus.org                MX      0 a.mx.gnus.org

There is, and if there weren't, it would have fallen back to the A record.

The script then tries to connect to the MTA. If there's more than one MX record, it connects to the servers in preference order.

And then it does the traditional call-out to check whether the MTA in question really accepts the address:

[larsi@quimbies ~]$ nc a.mx.gnus.org smtp
220 quimby.gnus.org ESMTP Exim 3.35 #1 Wed, 06 Apr 2005 01:02:15 +0200
HELO sea.gmane.org
250 quimby.gnus.org Hello sea.gmane.org [80.91.229.5]
MAIL FROM:<auth@gmane.org>
250 <auth@gmane.org> is syntactically correct
RCPT TO:<lala@gnus.org>
550 Unknown local part lala in <lala@gnus.org>
QUIT
221 quimby.gnus.org closing connection
In this case, the MTA said that it didn't know the address, so the message is refused.

(Other things that may lead to the message being refused are read-only mailing lists or discontinued mailing lists.)

Addresses that are valid are stored in a white-list. Addresses already in the white-list won't have these checks performed.

Address validation isn't an anti-spam measure, either. It's purely a user interface issue. When the user tries posting with an invalid address, she will be told immediately that the address is invalid, and can fix that, instead of having the message disappearing into /dev/null in the next phase.

Which is:

Phase Three: Challenge/Response

We now know that the message has a From address that's valid, but we don't know that the person who owns that mail account actually posted the message.

So instead of posting the message to the news group, the message is passed on to a new set of scripts. (On a different server, even.)

Cross-posting is first checked, and for each group in question, a C/R is potentially generated.

The C/R consists of an email message sent to the address in question saying that Gmane has received a message, and to have it passed on to the mailing list, the user has to answer the message.

When the user responds to this message (via mail), then the message is released to the next phase. One such C/R has to be done once per address/mailing list pair. That is, if the user has responded to the C/R for a specific mailing list once, she'll never have to do that again.

If there's more than one outstanding C/R, all message but the last one will be deleted.

Phase Four: Sending to the Mailing List

Finally, the message is sent to the mailing list. The mailing list may or may not accept the message, but that's not really my problem.

The original From header is preserved -- Gmane doesn't rewrite any of the headers. However, the RFC2821 MAIL FROM envelope is set.

Gmane can't forge messages as coming from somewhere it doesn't come from, in these SPF days. So the MAIL FROM envelope has to be something@gmane.org. It is currently the address that Gmane is subscribed to, but that may change in the future.

Phase Five: Sending an Email to Lars to Complain That Phase Four Didn't Work

These all get the form answer "Read the FAQ". No matter what.

Gratuitous Statistics