Re: [R] understanding patterns in categorical vs. continuous data

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 27 Jan 2006 - 14:07:42 EST


From: Dave Roberts
>
> You might prefer boxplot(insolation~veg_type) as a graphic.
> That will
> give you quantiles. To get the actual numeric values you could
>
> for (i in levels(veg_type)) {
> print(i)
> quantile(insolation[veg_type==i])
> }
>
> see ?quantile for more help.

If you want the five-number summaries plotted in the boxplots, just look at the returned object of boxplot():

> g <- factor(rep(1:3, 10))
> y <- rnorm(30)
> res <- boxplot(y ~ g)
> str(res)

List of 6

 $ stats: num [1:5, 1:3] -1.135 -0.757 -0.536  0.499  0.996 ...
 $ n    : num [1:3] 10 10 10
 $ conf : num [1:2, 1:3] -1.1639  0.0918 -0.5208  1.6546 -1.2487 ...
 $ out  : num(0) 
 $ group: num(0) 
 $ names: chr [1:3] "1" "2" "3"

If you just want to compute the summaries without the boxplots, use fivenum():

> tapply(y, g, fivenum)

$"1"
[1] -1.1352456 -0.7571895 -0.5360496 0.4994445 0.9956749

$"2"
[1] -1.1408493 -0.3751730 0.5668747 1.8018146 2.0019303

$"3"
[1] -2.2309983 -0.9333305 -0.3402786 0.8849042 0.9833057

... and if you really want the quantiles, you can do that, too:

> tapply(y, g, quantile)

$"1"

        0% 25% 50% 75% 100% -1.1352456 -0.7391977 -0.5360496 0.3378861 0.9956749

$"2"

        0% 25% 50% 75% 100% -1.1408493 -0.3039648 0.5668747 1.6669879 2.0019303

$"3"

        0% 25% 50% 75% 100% -2.2309983 -0.8389260 -0.3402786 0.6746950 0.9833057

... but note how the quartiles and hinges are not necessarily the same.

Andy  

> Dylan Beaudette wrote:
> > Greetings,
> >
> > I have a set of bivariate data: one variable (vegetation
> type) which is
> > categorical, and one (computed annual insolation) which is
> continuous.
> > Plotting veg_type ~ insolation produces a nice overview of
> the patterns that
> > I can see in the source data. However, due to the large
> number of samples
> > (1,000), and the apparent "spread" in the distribution of a
> single vegetation
> > type over a range of insolation values- I having a hard
> time quantitatively
> > describing the relationship between the two variables.
> >
> > Here is a link to a sample graph:
> > http://casoilresource.lawr.ucdavis.edu/drupal/node/162
> >
> > Since the data along each vegetation type "line" is not a
> distribution in the
> > traditional sense, I am having problems applying
> descriptive statistical
> > methods. Conceptually, I would like to some how describe
> the variation with
> > insolation, along each vegetation type "line".
> >
> > Any guidance, or suggested reading material would be
> greatly appreciated.
> >
> >
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> David W. Roberts office
> 406-994-4548
> Professor and Head FAX
> 406-994-3190
> Department of Ecology email
> droberts@montana.edu
> Montana State University
> Bozeman, MT 59717-3460
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jan 27 14:16:56 2006

This archive was generated by hypermail 2.1.8 : Fri 27 Jan 2006 - 20:04:24 EST