Importing Archives Into Gmane

Mailing list archives can be imported into Gmane.

Archives to be imported can be in one of two formats: Either a tar file of a one-message-per-file directory, where the files have names that increase numerically, or a Unix mbox file. No other formats are acceptable. A Unix mbox file is preferred.

If you wish to have an archive of a mailing list you administrate imported into Gmane, send a mail to Lars with the URL of the mailing list archive, and which group it should be imported into. Duplicates are ignored when doing an import, so a total archive of the list is ok -- no pre-filtering of messages is necessary.

The list admin/owner should OK this before the archive is imported. If you're the list admin, please say so in the email where you request the import. If not, please get in touch with the list admin first and get an approval before you request the import. The list admin often has access to an mbox format mail archive for the list, so get the URL for the archive at the same time.

For the technically inclined, here's how a mailing list archive import is done. It's not always as straightforward as it may seem.

  • If there are no articles already in the group, the archive is simply imported.

  • If there are already articles in the group, things get a bit more complicated, since Gmane tries to keep at least a loose correlation between the order of the article numbers and the sequence in which the messages were posted.

    1. Let's say there's already articles 1-1000 in the group, and there's 2000 (unstored) articles in the archive.

    2. Reception of new articles for the group is temporarily disabled.

    3. The archive is imported into the group, ignoring any articles that have already been stored in the group. The articles from the archive get article numbers 1001-3000.

    4. Articles 1-1000 are renamed to 3001-4000.

    5. Using a hacked-up version of the prunehistory inn command, the storage tokens for these moved articles are altered.

    6. The overview file for the group is regenerated.

    7. Any articles that arrived while doing this operation are handled and injected into the group.

    This means that if you've read articles in the group before doing the import, they'll suddenly become unread again, since they're assigned new article numbers. This is inconvenient, but it's a one-time inconvenience. Having the articles permanently out of sequence would be a permanent inconvenience.

    The web interface to the articles will still respect the old article numbering, as well as the new. and both refer to the same article after one of these renumberings.