Re: [R] R usage for log analysis

From: Allen S. Rout <asr_at_ufl.edu>
Date: Mon 12 Jun 2006 - 14:44:51 EST

"Gabriel Diaz" <gabidiaz@gmail.com> writes:

> and what is the correct path to do it?
>
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?

I haven't stuck anything in a DB yet. I'm not sure how much of the DB clue is used under the covers.

> pre-parse the log files to accomodate them to R?
 

Probably not; a little familiarity with the reading functions will obviate most needs to pre-parse.

> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?

[...]

Don't expect a warm welcome. This community is like all open-source communities, sharply focused on its' own concerns and expertise. And, in an unusual experience for computer types, our core competencies hold little or no sway here; they don't even give us much of a leg up. Just wait 'till you want to do something nutso like produce a business graphic. :)

I'm working on understanding enough of R packaging and documentation to begin a 'task view' focused on systems administration, for humble submission. That might end up being mostly "log analysis"; the term can describe much of what we do, if it's stretched a bit. I'm hoping the task view will attract the teeming masses of sysadmins trapped in the mire of Gnuplot and friends.

For starters, become familliar with read.table(); with a few variations it will take care of all the

while (<>) { @blah = split(/,/); etc. etc. etc. }

you've been accustomed to doing.

Name columns; this makes it easier to think about your data.

names(my_data)<-c("column","names","can","be","assigned","to")

Start thinking of your data in generic sets, as opposed to specific rows. Situations which required iteration over specific rows in PERL-land fall neatly to precise assignment in R. For example, if you've got records with dates and times and you want to work with time structures:

in PERL you'd

foreach (...)
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}

or some such. In R-land, the iteration is implicit. Here's a snippet from something I'm using

a$pdate<-as.POSIXct(paste(format(a$dte,"%Y/%m/%d"),a$time))

You're really acting on logical columns all at once here. This is fantastically more efficient in terms of your thought processes.


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jun 12 15:15:59 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 12 Jun 2006 - 22:11:34 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.