65 lines
1.9 KiB
Text
65 lines
1.9 KiB
Text
PySpamSum v1.0
|
|
==============
|
|
|
|
spamsum is a fuzzy hash specifically designed for hashing email messages
|
|
to detect if they are SPAM. The spamsum utility includes the ability to
|
|
generate the spamsum hash and check a new message against a existing set
|
|
of hashes to find a match.
|
|
|
|
pyspamsum is a Python wrapper for the core API of spamsum.
|
|
|
|
The original spamsum code has been licensed under the terms of the
|
|
the Perl Artistic License. It has been slightly modified
|
|
|
|
The original code is Copyright Andrew Tridgell <tridge@samba.org> 2002.
|
|
It forms part of Andrew's junkcode, and is available here:
|
|
|
|
http://www.samba.org/junkcode/#spamsum
|
|
|
|
The spamsum code in this project is derived from an updated version that
|
|
was published at Linux.conf.au 2004:
|
|
|
|
http://linux.anu.edu.au/linux.conf.au/2004/papers/junkcode/spamsum
|
|
|
|
For details on spamsum itself, please see the spamsum README:
|
|
|
|
http://samba.org/ftp/unpacked/junkcode/spamsum/README
|
|
|
|
This Python wrapper is released under the new BSD license, and is
|
|
Copyright Russell Keith-Magee <russell@keith-magee.com> 2009.
|
|
|
|
Installation
|
|
------------
|
|
|
|
At a prompt, run:
|
|
|
|
$ python setup.py install
|
|
|
|
Usage
|
|
-----
|
|
|
|
# Import spamsum and set up some strings
|
|
>>> import spamsum
|
|
>>> s1 = "I am the very model of a modern Major-General, I've information animal and vegegtable and mineral"
|
|
>>> s2 = "I am the very model of a modern Brigadier, I've information animal and vegetable and something else"
|
|
>>> s3 = "Huh? Gilbert and Who?"
|
|
|
|
# Evaluate the edit distance between two strings
|
|
>>> spamsum.edit_distance(s1, s2)
|
|
28
|
|
|
|
# Evaluate the spamsum of some strings
|
|
>>> sum1 = spamsum.spamsum(s1)
|
|
>>> sum2 = spamsum.spamsum(s2)
|
|
>>> sum3 = spamsum.spamsum(s2)
|
|
>>> print sum1
|
|
3:kEvyc/sFIKwYclQY4MKLFE4IgunfELzIKygn:kE6Ai3KQ/MKOgWf/KZn
|
|
|
|
# Compare two spamsums. 0 = no match, 100 = perfect match.
|
|
>>> spamsum.match(sum1, sum1)
|
|
100
|
|
>>> spamsum.match(sum1, sum2)
|
|
66
|
|
>>> spamsum.match(sum1, sum3)
|
|
0
|
|
|