Skip to main content

Archiving email : Thunderbird tags not ready for primetime yet

Thunderbird stores email in plain text files (mbox format) that have names like INBOX and mysweetiesemails (files with no extensions). Riding with these mbox files are other plaintext files with corresponding names INBOX.msf and mysweetiesemails.msf . These .msf files are mail summary files used by thunderbird to speed up certain actions that would otherwise require repeated trawls through a large text file. The .msf files are autogenerated as required and can be deleted.

The good thing about mbox, I think, is that in the end it is not proprietary and it is plain text. Thunderbird has enough of a user base that whatever comes after Thunderbird will have a way to import mbox mail (if not use mbox directly).

I wanted to archive my email off my university's server onto local folders on my compuer. Thunderbird's "Local Folders" can be reconfigured to any folder on your computer, by going to tools->account settings.

I had organized my mail into different folders on my pop server by year, and some separate folders for special classes of emails (e.g. rejection notices from journals...). When I moved the emails, I decided to move them, by year, into archive folders on my computer (local folders).

Now, dumping all the mail into one archive "folder" (which would translate to one Thunderbird mbox file) would be fine, but I believe that this single file would balloon too much in size and eventually become unmanageable, so I settled on having a folder (mbox file) for each year.

The transfer from pop to local archive folder went fine, but of course, I lost my special organization, so the mail from an editor of a magazine complimenting me on my work (important to me) in 2005 got mixed in with several hundred emails between me and my colleagues trying to figure out some trivial, daily work related stuff (not so important) also in 2005.

This highlights, to me, the need to have some kind of categorization that travels with each individual email, so it won't get lost if you move it around - like ratings and tags and captions for images, which are now stored within the jpg as metadata, so you don't have separate files that need to travel with your pictures, storing captions and ratings etc.

Thunderbird's tag (formerly label) system covers this. But be aware that not only are there no standards for labels in emails, the mbox storage format, though popularly supported is not a formal standard and comes in different inconsistent forms. So tagging, recognized by Thunderbird 2.0 may not be by Thunderbird 3.0 or FutureMailClient 1.0. Heck, people have had problems when mixing labels (TB1.5) and tags (TB2.0).

Anyhow, Thunderbird supposedly stores tag information in the message header with the "X-Mozilla-Keys" header and I got all excited until I found out that for local folders the tags get stored in the .msf and not the header [Bug 378973]. So, for the moment I've given up on tagging or labeling until the dust settles. I may even go back to separate folders.

Incidentally http://lxr.mozilla.org/ is a good place to browse mozilla source code. Thunderbird stuff is found here. The code for handling tags is here.

Comments

  1. Where does it store tag info for IMAP Maildirs? I had to manipulate a user's messages directly on the mailserver and let Tbird reindex, which wiped out her existing tags. I've now retagged a few messages in order to have something to grep, but I cannot figure out where they're stored, so I can't use the backup to try to restore her previous tag data.

    ReplyDelete
  2. Okay, found it in the local .msf, but I can't make heads or tails out of it. I made copies of two versions of the INBOX.msf, one where a single message was tagged and one where it wasn't, then diffed them. The msf header lines are fairly straightforward; we find
    (8A=label) but nothing for tag except (C6=current-view-tag), which isn't it.

    The diff is an additional 13 lines but the only instance of 8A is (^8A=0), which appears for every message's entry. I'd sure like to find an msf -> human-readable parser.

    ReplyDelete
  3. Hi Ludwig,

    Unfortunately, I didn't go too deep into the .msf files.

    Why are you looking to manipulate the .msf files yourself?

    ReplyDelete
  4. Also, TBird can be forced to store labels in the mbox see this post.

    ReplyDelete

Post a Comment

Popular posts from this blog

A note on Python's __exit__() and errors

Python's context managers are a very neat way of handling code that needs a teardown once you are done. Python objects have do have a destructor method ( __del__ ) called right before the last instance of the object is about to be destroyed. You can do a teardown there. However there is a lot of fine print to the __del__ method. A cleaner way of doing tear-downs is through Python's context manager , manifested as the with keyword. class CrushMe: def __init__(self): self.f = open('test.txt', 'w') def foo(self, a, b): self.f.write(str(a - b)) def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.f.close() return True with CrushMe() as c: c.foo(2, 3) One thing that is important, and that got me just now, is error handling. I made the mistake of ignoring all those 'junk' arguments ( exc_type, exc_val, exc_tb ). I just skimmed the docs and what popped out is that you need to return True or...

Store numpy arrays in sqlite

Use numpy.getbuffer (or sqlite3.Binary ) in combination with numpy.frombuffer to lug numpy data in and out of the sqlite3 database: import sqlite3, numpy r1d = numpy.random.randn(10) con = sqlite3.connect(':memory:') con.execute("CREATE TABLE eye(id INTEGER PRIMARY KEY, desc TEXT, data BLOB)") con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", sqlite3.Binary(r1d))) con.execute("INSERT INTO eye(desc,data) VALUES(?,?)", ("1d", numpy.getbuffer(r1d))) res = con.execute("SELECT * FROM eye").fetchall() con.close() #res -> #[(1, u'1d', <read-write buffer ptr 0x10371b220, size 80 at 0x10371b1e0>), # (2, u'1d', <read-write buffer ptr 0x10371b190, size 80 at 0x10371b150>)] print r1d - numpy.frombuffer(res[0][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] print r1d - numpy.frombuffer(res[1][2]) #->[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Note that for work where data ty...