[R] Newbie: Using R to analyse Apache logs

From: Raj Mathur <raju_at_linux-delhi.org>
Date: Thu, 31 Jan 2008 19:01:05 +0530

hits=-2.5 tests=BAYES_00,FORGED_RCVD_HELO X-USF-Spam-Flag: NO


I have a requirement to scan Apache logs and discover ``exceptions''. Exceptions can be of two types:

  1. A single IP generating a large amount of traffic within a given time frame (for definable values of ``large'' and ``time frame'').
  2. A single IP hitting a wide set of URLs on the server (indicates a crawler), again for definable values of ``wide''.

I'm a complete newbie to R (and to statistics), so the questions are:

Data massaging, tuning, etc. are not an issue. We'd be dealing with a few hundred thousand or a million records a day.


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 Jan 2008 - 13:41:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 Jan 2008 - 15:30:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive