Re: [R] Transforming simulation data which is spread acrossmanyfiles into a barplot

From: Ian Bentley <ian.bentley_at_gmail.com>
Date: Sat, 12 Jun 2010 12:44:39 -0400

Thanks Gabor - I was able to use that for my purposes.

On 11 June 2010 16:27, Bert Gunter <gunter.berton_at_gene.com> wrote:

> So two time series? Fair enough. But less is more. Plot them as separates
> series of points connected by lines, different colors for the two different
> series. Or as two trellises plots. You may also wish to overlay a smooth to
> help the reader see the "trend"(e.g via a loess or other nonparametric
> smooth, or perhaps just a fitted line).
>
> The only part of a bar that conveys information is the top. The rest of the
> fill is "chartjunk" (Tufte's term) and distracts.
>
>
> I'll keep this in mind. I am just using this chart for my own analysis
now, and probably won't include it later.

> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> -----Original Message-----
> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> On
> Behalf Of Ian Bentley
> Sent: Friday, June 11, 2010 12:15 PM
> To: Bert Gunter
> Cc: r-help_at_r-project.org; Hadley Wickham
> Subject: Re: [R] Transforming simulation data which is spread
> acrossmanyfiles into a barplot
>
> I'm not trying to see the relation between sent and received, but rather to
> show how these grow across the increasing complexity of the 50 data points.
>
> On 11 June 2010 15:02, Bert Gunter <gunter.berton_at_gene.com> wrote:
>
> > Ouch! Lousy plot. Instead, plot the 50 (mean sent, mean received)pairs
> as
> > a
> > y vs x scatterplot to see the relationship.
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> >
> >
> > -----Original Message-----
> > From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> > On
> > Behalf Of Hadley Wickham
> > Sent: Friday, June 11, 2010 11:53 AM
> > To: Ian Bentley
> > Cc: r-help_at_r-project.org
> > Subject: Re: [R] Transforming simulation data which is spread across
> > manyfiles into a barplot
> >
> > On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley_at_gmail.com>
> > wrote:
> > > I'm an R newbie, and I'm just trying to use some of it's graphing
> > > capabilities, but I'm a bit stuck - basically in massaging the already
> > > available data into a format R likes.
> > >
> > > I have a simulation environment which produces logs, which represent a
> > > number of different things. I then run a python script on this data,
> and
> > > putting it in a nicer format. Essentially, the python script reduces
> the
> > > number of files by two orders of magnitude.
> > >
> > > What I'm left with, is a number of files, which each have two columns
> of
> > > data in them.
> > > The files look something like this:
> > > --1000.log--
> > > Sent Received
> > > 405.0 3832.0
> > > 176.0 1742.0
> > > 176.0 1766.0
> > > 176.0 1240.0
> > > 356.0 3396.0
> > > ...
> > >
> > > This file - called 1000.log - represents a data point at 1000. What I'd
> > like
> > > to do is to use a loop, to read in 50 or so of these files, and then
> > produce
> > > a stacked barplot. Ideally, the stacked barplot would have 1 bar per
> > file,
> > > and two stacks per bar. The first stack would be the mean of the sent,
> > and
> > > the second would be the mean of the received.
> > >
> > > I've used a loop to read files in R before, something like this ---
> > >
> > > for (i in 1:50){
> > > tmpFile <- paste(base, i*100, ".log", sep="")
> > > tmp <- read.table(tmpFile)
> > > }
> > >
> >
> > # Load data
> > library(plyr)
> >
> > paths <- dir(base, pattern = "\\.log", full = TRUE)
> > names(paths) <- basename(paths)
> >
> > df <- ddply(paths, read.table)
> >
> > # Compute averages:
> > avg <- ddply(df, ".id", summarise,
> > sent = mean(sent),
> > received = mean(received)
> >
> > You can read more about plyr at http://had.co.nz/plyr.
> >
> > Hadley
> >
> > --
> > Assistant Professor / Dobelman Family Junior Chair
> > Department of Statistics / Rice University
> > http://had.co.nz/
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
> --
> Ian Bentley
> M.Sc. Candidate
> Queen's University
> Kingston, Ontario
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Ian Bentley
M.Sc. Candidate
Queen's University
Kingston, Ontario

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 12 Jun 2010 - 16:47:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 12 Jun 2010 - 16:50:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive