No description
Find a file
2008-05-03 07:59:51 -05:00
lib/mailpie typo 2008-05-01 19:47:23 -05:00
scripts fix --after=, --before, --limit to take arguments 2008-04-30 15:09:03 -05:00
.gitignore items created by distutils 2008-04-26 10:56:08 -05:00
COPYING include copy of GPL 2008-04-26 10:14:39 -05:00
mkdist.py a script to call git-archive 2008-05-03 07:59:51 -05:00
README describe new bug fixed 2008-05-03 07:44:06 -05:00
setup.py update version number 2008-05-03 07:44:28 -05:00

mailpie - e-mail full text search

PURPOSE

mailpie is a suite of programs for commandline full-text search of large
e-mail archives.  Keep your inbox uncluttered while retaining the
ability to quickly find an old message that becomes relevant again.

In fact, mailpie performs searches much more quickly than many MUAs'
built in search facilities because it uses a time-tested full text
indexer (swish-e).


REQUIREMENTS

- python  (tested with python 2.4.3 and 2.5.2)
- swish-e (tested with swish-e 2.4.3 and 2.4.5)

mailpie is developed on Linux systems.


INSTALLATION

Site-wide installation:
    $ sudo python setup.py install

Personal installation:
    $ python setup.py install --home=$HOME
Make sure that $HOME/lib/python is on PYTHONPATH

See distutils documentation for more installation options.


USAGE

To add a mailbox full of messages to the mailpie storage:
    mailpie-add example.mbox
After adding messages with mailpie-add, the original mbox file is not
needed to perform searches.

To search for messages:
    mailpie-search from=jepler mutt
This will find messages where the From: line matches 'jepler' and the
header or body matches 'mutt'.

Available search tags are:
    subject from to cc bcc list-id message-id date header
Without a tag, the message headers and body are all searched.

To rebuild the index from scratch (e.g., in case of an aborted
mailpie-add): mailpie-index

For more information on commandline options, see mailpie-xxx --help.


TIPS

Because swish-e can take a long time to merge two index files together,
mailpie uses a two-level index system.  mailpie-add puts new messages in
"index.recent".  When this index grows large enough that the merge operation
becomes slow, run "mailpie-index -im" to merge "index.recent" into the main
index (this may take quite a long time).  After that, mailpie-add will be fast
again.


PRINCIPLE

mailpie-add splits mailboxes into individual messages, which are stored
in separate files according to their sha1 hash.

When indexing, mailpie-add and mailpie-index convert each message into
an xml document with markup that indicates headers that should be
indexed specially (e.g., with the <from></from> tag).  These xml
documents are then handed off to swish-e to index.

When searching, mailpie-search involkes swish-e to find the messages
that match the given search terms.  It concatentates each message into
an mbox and then optionally invokes a mailreader.  Optionally,
mailpie-search follows in-reply-to and references to find other messages
in the same thread as matching messages.


EFFICIENCY

When adding thousands of messages, the average rate on a 1.8GHz machine
is 60/second.

When searching tens of thousands of messages, the time is well under 1
second for 100 results when the swish indexes are in memory.


NON-FEATURES

The following features are outside the scope of mailpie and are unlikely
to be added:

 * A Graphical User Interface

 * Support for mailbox formats other than 'mbox'


BUGS

 * When searching for threads, only messages earlier in the thread than
   the matching message are found.

 * There is no provision for removing messages from a mailpie storage area
   or from the index.


RELEASE HISTORY

v0.3:
 * mailpie-add: fix typo that caused it to error at the end

v0.2:
 * mailpie-add: improve speed of by introducing two levels of index
 * mailpie-add: fix time report when there are zero tens-of-seconds
 * mailpie-search: -l, --after=, --before=, and --limit=  to take arguments
 * mailpie-search: improve -N flag to configure mailreader
 * mailpie-index: add --merge action
 * all: improve usage messages
 * all: improve progress reporting

v0.1:
 * Initial release

COPYRIGHT

Copyright © 2008 Jeff Epler <jepler@unpythonic.net>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
                                                                          
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
                                                                          
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA