[Teammetrics-discuss] Detecting possible SPAM patterns.

Sukhbir Singh sukhbir.in at gmail.com
Sat Jun 18 19:02:08 UTC 2011


Hi,

I downloaded some lists and ran some SELECT queries to find out
possible spam patterns. Here are some points I have come up with, in
addition to all those you mentioned in an earlier mail.

1. Names that start with '=':
There are many names that with '='. In fact, this is one of the most
common patterns I have seen. Like:

    =?windows-1251?B?bWVwcm14eWU=?=
    =?UTF-8?B?U3RlZmZlbiBNw7ZsbGVy?=

So if the name starts with an equal to sign, we discard that name.

2. Names in upper case:
Names in capital letters are a clear indication of spam.

3. Names with the words 'lottery', 'promotion' and 'loan' in them.

4. Names that start with either of 'Mr', 'Mrs' or 'Dr'.

5. Names that have '.com' and .'net' in them. I am aware that there
can be other TLDs that constitute spam, but for our purpose, this
should be enough.

-
Coming to the Subject field:

1. Subjects that start with '='.

2. Subjects in upper case.

I need your thoughts on all points.



More information about the Teammetrics-discuss mailing list