Re: [R] Transforming simulation data which is spread across manyfiles into a barplot

From: Bert Gunter <gunter.berton_at_gene.com>
Date: Fri, 11 Jun 2010 12:02:55 -0700

Ouch! Lousy plot. Instead, plot the 50 (mean sent, mean received)pairs as a y vs x scatterplot to see the relationship.

Bert Gunter
Genentech Nonclinical Biostatistics    

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Hadley Wickham
Sent: Friday, June 11, 2010 11:53 AM
To: Ian Bentley
Cc: r-help_at_r-project.org
Subject: Re: [R] Transforming simulation data which is spread across manyfiles into a barplot

On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley_at_gmail.com> wrote:
> I'm an R newbie, and I'm just trying to use some of it's graphing
> capabilities, but I'm a bit stuck - basically in massaging the already
> available data into a format R likes.
>
> I have a simulation environment which produces logs, which represent a
> number of different things.  I then run a python script on this data, and
> putting it in a nicer format.  Essentially, the python script reduces the
> number of files by two orders of magnitude.
>
> What I'm left with, is a number of files, which each have two columns of
> data in them.
> The files look something like this:
> --1000.log--
> Sent Received
> 405.0 3832.0
> 176.0 1742.0
> 176.0 1766.0
> 176.0 1240.0
> 356.0 3396.0
> ...
>
> This file - called 1000.log - represents a data point at 1000. What I'd
like
> to do is to use a loop, to read in 50 or so of these files, and then
produce
> a stacked barplot.  Ideally, the stacked barplot would have 1 bar per
file,
> and two stacks per bar.  The first stack would be the mean of the sent,
and
> the second would be the mean of the received.
>
> I've used a loop to read files in R before, something like this ---
>
> for (i in 1:50){
>    tmpFile <- paste(base, i*100, ".log", sep="")
>    tmp <- read.table(tmpFile)
> }
>

# Load data
library(plyr)

paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths)

df <- ddply(paths, read.table)

# Compute averages:
avg <- ddply(df, ".id", summarise,
  sent = mean(sent),
  received = mean(received)

You can read more about plyr at http://had.co.nz/plyr.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 11 Jun 2010 - 19:04:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jun 2010 - 20:00:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive