Re: [R] summary statistics

From: Petr PIKAL <>
Date: Thu, 14 Feb 2008 08:17:12 +0100


one of the good starting points is Paul Johnsons StatsRus (the first hit in Google and I believe it is in Rwiki too). It helped me when I started with R about 10 years ago. For me usually the best way to arrange data is in "database form". It means each column is a variable (numerical, categorical, logical) and each row is a record for one particular event (sample, day, hour, minute, etc.). From this rectangular data frame you can easily choose a subset by

?subset or data[condition, ]

you can use apply like functions (see also ?aggregate, ?by) to get summaries. And you can perform various models based on such constructed data frames.And you can also try to look to ?merge and/or maybe ?reshape to join data frames and to transfer them to other forms sometimes more suitable for printing.

And last but not least you can look into almost any of suggested publications in CRAN, usually all of them has some intro how to operate with data inside R or you can briefly go through Rintro manual you probably have installed with R distribution especially chapters 2-6.

For editing commands I use external editor as a more convenient option. Tinn-R which is easier for non-Unix people then Emacs though probably less powerful.


Petr napsal dne 13.02.2008 15:07:53:

> Jim,
> prettyR looks like it will work, but with the way that my data frame
> is set up I still can not get what I want out of it. I am lacking in
> my knowledge on manipulating data frames, and general R programing.
> Is there a reference that will give me all of these wonderful data
> manipulation tools that previous posters in this thread have sighted:
> >tapply(x$mgl, x$RM, summary)
> I read the help file for tapply and I still don't entirely understand
> it. I am a biologist trying to learn R because it serves my purposes-
> I am not a programmer; however, I see the utility enough to be right
> stubborn when it come to learning this. I am fully aware that I am
> not competent in programing, but I do know the the theory behind the
> analyzes that I am performing, which, granted, are not all that
> sophisticated (mostly descriptive statistics, community data
> ordination, and time series analysis). My main complaint is I don't
> know which way to put a data frame into R to let it proceed with the
> analysis. I preprocess most of my data in excel- this is a convention
> imposed by the kind of data and knowledge of the rest of the folks
> that are in my lab. Is there a succinct reference for programing in
> R(S+) on how to rearrange data once inside of R, and maybe a general
> guide for matrix/ data frame setup for general analysis and what "way"
> something should look for different kinds of analysis. I am learning
> and, most of the time, I can generate graphs etc. faster than in
> excel, but sometimes I spend hours trying to "get the data in the
> right format" and then three minutes of actual coding and result
> generation. Maybe I'm out of my league, but I am not content with
> giving my data over to somebody else to do the analysis because
> relativley more knowledge on how a river system works.
> thanks everybody for the continuing help
> Stephen
> On Feb 13, 2008 5:03 AM, Jim Lemon <> wrote:
> > stephen sefick wrote:
> > > below is my data frame. I would like to compute summary statistics
> > > for mgl for each river mile (mean, median, mode). My apologies in
> > > advance- I would like to get something like the SAS print out of
> > > Univariate. I have performed an ANOVA and a tukey LSD and I would
> > > just like the summary statistics.
> >
> > Hi Stephen,
> > Have a look at "describe" in the prettyR package. You can specify the
> > summary stats that you want, and the formatting may suit you.
> >
> > Jim
> >
> >
> --
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods. We are mammals, and have not exhausted the
> annoying little problems of being mammals.
> -K. Mullis
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code. mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 14 Feb 2008 - 07:27:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 14 Feb 2008 - 07:30:13 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive