Re: [R] R usage for log analysis

From: Allen S. Rout <>
Date: Mon 12 Jun 2006 - 14:44:51 EST

"Gabriel Diaz" <> writes:

> and what is the correct path to do it?
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?

I haven't stuck anything in a DB yet. I'm not sure how much of the DB clue is used under the covers.

> pre-parse the log files to accomodate them to R?

Probably not; a little familiarity with the reading functions will obviate most needs to pre-parse.

> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?


Don't expect a warm welcome. This community is like all open-source communities, sharply focused on its' own concerns and expertise. And, in an unusual experience for computer types, our core competencies hold little or no sway here; they don't even give us much of a leg up. Just wait 'till you want to do something nutso like produce a business graphic. :)

I'm working on understanding enough of R packaging and documentation to begin a 'task view' focused on systems administration, for humble submission. That might end up being mostly "log analysis"; the term can describe much of what we do, if it's stretched a bit. I'm hoping the task view will attract the teeming masses of sysadmins trapped in the mire of Gnuplot and friends.

For starters, become familliar with read.table(); with a few variations it will take care of all the

while (<>) { @blah = split(/,/); etc. etc. etc. }

you've been accustomed to doing.

Name columns; this makes it easier to think about your data.


Start thinking of your data in generic sets, as opposed to specific rows. Situations which required iteration over specific rows in PERL-land fall neatly to precise assignment in R. For example, if you've got records with dates and times and you want to work with time structures:

in PERL you'd

foreach (...)
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}

or some such. In R-land, the iteration is implicit. Here's a snippet from something I'm using


You're really acting on logical columns all at once here. This is fantastically more efficient in terms of your thought processes. mailing list PLEASE do read the posting guide! Received on Mon Jun 12 15:15:59 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 12 Jun 2006 - 22:11:34 EST.

Mailing list information is available at Please read the posting guide before posting to the list.