Re: [R] R-help "spam" detection; please help the moderators

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Tue, 01 Jun 2010 18:22:10 +0100 (BST)


Hi Joris,
The "matched a filter rule" is the principal reason for holding messages for moderation. Please don't become anxious about the situation -- one of the reasons we have become concerned about the situation is that people whose messages get held up do tend to become worried about it. This is unnecessary -- they, and you, are not really doing anything wrong!

As I understand it, the filter rules are set by the ethz.ch admins, and even Martin does not seem to know, in detail, what they are. Also, it is likely that the filters "learn", and they may well be "learning" from a lot of other emails received at eth.ch which have nothing to do with R-help but which are true spam -- then the headers of such messages could be folded into "Bayes spam scores" which can trigger "matched a filter rule".

As far as R-help is concerned, the situation seems to be that gmail.com and nabble.com are important triggers (though there are plenty of others). Lots of email addresses which do not have the username ending in digits are trapped in this way.

In round figures, proportions I have logged myself amongst the messages held because they "matched a filter rule" are:

non-gmail, non-nabble: 30%
non-gmail, nabble    : 25%
    gmail, non-nabble: 32%
    gmail, nabble    : 13%

gmail : 45%
nabble: 52%

The true nature of this situation is still unclear!

Ted.

On 01-Jun-10 16:00:40, Joris Meys wrote:
> Hi all,
>
> I also couldn't help but notice that some of my messages are bounced
> for following reason:
>
> The message headers matched a filter rule
>
> I included the header of one of the messages below, but neither of
> these messages is sent trough Nabble, nor does any mail address has
> digits in it.
> I also never had that before. Did you change some of the rules somehow?
>
> Cheers
> Joris
>
> -----------------------
>
> MIME-Version: 1.0
> Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700
> (PDT)
> In-Reply-To:
> <AANLkTim9eTuY2EfynLoH2LYN7M133YTjeNcDJpkGPHJx@mail.gmail.com>
> References:
> <AANLkTikgC7V2ZbSYRWcWBUeeZm8D24qj0VqeB2z1NduD@mail.gmail.com>
> <AANLkTim9eTuY2EfynLoH2LYN7M133YTjeNcDJpkGPHJx@mail.gmail.com>
> Date: Fri, 28 May 2010 14:32:32 +0200
> Delivered-To: jorismeys_at_gmail.com
> Message-ID:
> <AANLkTimg4IDyiVhe1ek9mk6_RybjcNuU4msvWRvtSGTS@mail.gmail.com>
> Subject: Re: [R] How to get values out of a string using regular
> expressions?
> From: Joris Meys <jorismeys_at_gmail.com>
> To: Gabor Grothendieck <ggrothendieck_at_gmail.com>
> Cc: R mailing list <r-help_at_r-project.org>
> Content-Type: multipart/alternative;
> boundary=000e0cd2295481515c0487a6b3be
>
> --000e0cd2295481515c0487a6b3be
> Content-Type: text/plain; charset=ISO-8859-1
>
>
>
> On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler
> <maechler_at_stat.math.ethz.ch>wrote:
>

>> Dear readers of R-help
>>
>> as most of you will *not* be aware, R-help has continued to work the
>> way it does, only thanks to a dozen of volunteers,
>> see https://stat.ethz.ch/mailman/listinfo/r-help .
>>
>> The volunteers manually moderate e-mails that "look like spam" (and
>> sometimes are and sometimes are not).
>> While much more than 90% of the spam is filtered out long before
>> a human sees it, with the increasing sophistication of spammers,
>> manual intervention has deemed to be necessary and served the
>> community very well.
>>
>> OTOH, in recent weeks, the amount of work for the volunteers has
>> increased, mainly because an increasingly number of non-spam postings
>> are
>> erronously tagged as "possibly spam".
>> We have discussed about this and done some analysis and found
>> that most of these message that produce a considerable amount of
>> extra work share two properties :
>>  1) they are posted via Nabble  {which *always* attaches a small
>>                                 pro-Nabble spam at the end of the
>>                                 message}
>>  2) the e-mail address of the sender is from a freemail
>>    provider, quite often 'at gmail dot com', and often the part
>>    *before* the '@' (at-sign) ends with digits.
>>
>> We hereby ask those among you who use a freemail account to
>> please no longer post via nabble.
>>
>> Thank you for your support of R-help, *the* "community mailing
>> list" of the R project since even before that project existed
>> "formally", namely since 1997-04-01,
>> today 13 years and two months.
>>
>> Martin Maechler, ETH Zurich
>> (and R-help creator and principal manager)
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

>
>
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> Joris.Meys_at_Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 01-Jun-10                                       Time: 18:22:08
------------------------------ XFMail ------------------------------

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 01 Jun 2010 - 17:24:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Jun 2010 - 17:40:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive