pyspamsum/README

PySpamSum v1.0
==============

spamsum is a fuzzy hash specifically designed for hashing email messages
to detect if they are SPAM. The spamsum utility includes the ability to
generate the spamsum hash and check a new message against a existing set
of hashes to find a match.

pyspamsum is a Python wrapper for the core API of spamsum.

The original spamsum code has been licensed under the terms of the
the Perl Artistic License. It has been slightly modified

The original code is Copyright Andrew Tridgell <tridge@samba.org> 2002.
It forms part of Andrew's junkcode, and is available here:

    http://www.samba.org/junkcode/#spamsum

The spamsum code in this project is derived from an updated version that
was published at Linux.conf.au 2004:

    http://linux.anu.edu.au/linux.conf.au/2004/papers/junkcode/spamsum

For details on spamsum itself, please see the spamsum README:

    http://samba.org/ftp/unpacked/junkcode/spamsum/README

This Python wrapper is released under the new BSD license, and is
Copyright Russell Keith-Magee <russell@keith-magee.com> 2009.

Installation
------------

At a prompt, run:

$ python setup.py install

Usage
-----

# Import spamsum and set up some strings
>>> import spamsum
>>> s1 = "I am the very model of a modern Major-General, I've information animal and vegegtable and mineral"
>>> s2 = "I am the very model of a modern Brigadier, I've information animal and vegetable and something else"
>>> s3 = "Huh? Gilbert and Who?"

# Evaluate the edit distance between two strings
>>> spamsum.edit_distance(s1, s2)
28

# Evaluate the spamsum of some strings
>>> sum1 = spamsum.spamsum(s1)
>>> sum2 = spamsum.spamsum(s2)
>>> sum3 = spamsum.spamsum(s2)
>>> print sum1
3:kEvyc/sFIKwYclQY4MKLFE4IgunfELzIKygn:kE6Ai3KQ/MKOgWf/KZn

# Compare two spamsums. 0 = no match, 100 = perfect match.
>>> spamsum.match(sum1, sum1)
100
>>> spamsum.match(sum1, sum2)
66
>>> spamsum.match(sum1, sum3)
0