Re: [R] R usage for log analysis

From: bogdan romocea <br44114_at_gmail.com>
Date: Tue 13 Jun 2006 - 00:38:04 EST


I wouldn't use a DBMS at all -- it is not necessary and I don't see what you would get in return. Instead I would split very large log files into a number of pieces so that each piece fits in memory (see below for an example), then process them in a loop. See the list and the documentation if you have questions about how to read text files, count strings etc.

#---split big files in two---
for F in `ls *log`
do
  fn=`echo $F | awk -F\. '{print $1}'`
  ln=`wc -l $F | awk '{print $1}'` #number of lines in the file   forsplit=`expr $ln / 2 + 50` #no. of lines in each chunk, tweak as needed   echo Splitting $F into pieces of $forsplit lines each........   split -l $forsplit $F $fn
done

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Gabriel Diaz
> Sent: Monday, June 12, 2006 9:52 AM
> To: Jean-Luc Fontaine
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] R usage for log analysis
>
> Hello
>
> Thanks all for the answers.
>
> I'm taking an overview to the project documentation, and seems the
> database is the way to go to handle log files of GB order (normally
> between 2 and 4 GB each 15 day dump).
>
> In this document http://cran.r-project.org/doc/manuals/R-data.html,
> says R will load all data into memory to process it when using
> read.table and such. Using a database will do the same? Well,
> currently i have no machine with > 2 GB of memory.
>
> The moodss thing looks nice, thanks for the link. But what i have to
> do now is an offline analysis of big log files :-). I will try to go
> with the mysql -> R way.
>
> gabi
>
>
>
> On 6/12/06, Jean-Luc Fontaine <jfontain@free.fr> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Allen S. Rout wrote:
> > >
> > >
> > > Don't expect a warm welcome. This community is like all
> open-source
> > > communities, sharply focused on its' own concerns and
> expertise. And,
> > > in an unusual experience for computer types, our core competencies
> > > hold little or no sway here; they don't even give us much
> of a leg up.
> > > Just wait 'till you want to do something nutso like
> produce a business
> > > graphic. :)
> > >
> > > I'm working on understanding enough of R packaging and
> documentation
> > > to begin a 'task view' focused on systems administration,
> for humble
> > > submission. That might end up being mostly "log
> analysis"; the term
> > > can describe much of what we do, if it's stretched a bit.
> I'm hoping
> > > the task view will attract the teeming masses of
> sysadmins trapped in
> > > the mire of Gnuplot and friends.
> > Although not specifically solving the problem at hand, you
> might want
> > to take a look at moodss and moomps
> (http://moodss.sourceforge.net/),
> > modular monitoring applications, which uses R
> > (http://jfontain.free.fr/statistics.htm) and its log module
> > (http://jfontain.free.fr/log/log.htm).
> >
> > - --
> > Jean-Luc Fontaine http://jfontain.free.fr/
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.3 (GNU/Linux)
> > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
> >
> > iD8DBQFEjT2ykG/MMvcT1qQRAuF6AJ9nf5phV/GMmCHPuc5bVyA+SoXqGACgnLuZ
> > u1tZpFOTCHNKOfFLZOC9uXI=
> > =V8yo
> > -----END PGP SIGNATURE-----
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 13 01:01:59 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 13 Jun 2006 - 20:12:19 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.