Tuesday, December 06, 2005

 

Search engine from a googler

Nelson Minar, a San Francisco Google engineer, released his very own search engine for e-mail. It might sound unbelievable, but it is exactly what it sounds like: a Google employee made his own tool [somewhat] competitive to its employer's Google Desktop Search.
The released Funes search engine indexes mbox e-mail archives (that's a Thunderbird native format, most of the other e-mail clients can at least export into the mbox format) and is used from the command line.

I have eight years of email archives. These archives are my memory. But my memory is awkward, it is hard for me to find things in it. When did I last write my old friend? What all have I thought about Jorge Luis Borges? Who wrote me email in early April, 2001?

Funes is a Java program that enables you to search your memories. At its core is a search engine: Funes indexes all of your email into a quickly-searchable database and then lets you query that database. The search engine itself is implemented with Lucene. Funes adds the glue to parse mailboxes and interact with you via a command line interface.

There are other search tools out there for email, for example grepmail or mg. I wrote Funes because I wanted my own tool to work my own way, because I needed something to keep out of trouble while I was looking for work, and because Lucene was so cool. Mostly, I wrote it because my memory is important to me.

Funes is currently minimally usable and has much work to go. I am not likely to work on it in the near future. It is available as free software according to the GNU Public License. If you try it, please let me know.

I will give it a try and let you know how it works.

Comments:
Hello, this is Nelson. The email search engine Funes was actually written in 2001, before I worked at Google. A temporary blog software mishap republished my original announcement as if it were new. Sorry for the confusion!

Funes actually did work when I released it, but I don't use it. I don't think anyone does. These days I use Gmail or a little Perl script called grepmail.
 
Oh, that's why the copyright in readme is about 2001. I thought it just haven't been updated :)

I wonder would someday there be a Thunderbird extension capable of matching the post date against date-like strings in the post body (I use Thunderbird as an RSS aggregator).
 
Post a Comment

Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?