Re: [R] cor(data.frame) infelicities

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Mon, 3 Dec 2007 14:05:36 -0500

You are right but I was just trying to stick to the same example. In reality it would be ok as long as its an ordered factor. One could restrict it to those of class "ordered".

On Dec 3, 2007 1:58 PM, Liaw, Andy <andy_liaw_at_merck.com> wrote:
> I'd call that another infelicity. Species is supposed to be nominal,
> not ordinal, so rank correlation wouldn't make much sense. So what does
> cor(, method="kendall") do? It looks like it simply uses the underlying
> numeric code. (Change Species to numerics and you'll see the same
> answer.) However, reordering the levels changes the result:
>
> R> iris2 <- iris
> R> levels(iris2$Species) <- levels(iris2$Species)[c(2, 1, 3)]
> R> cor(iris2, method = "kendall")
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> Sepal.Length 1.00000000 -0.07699679 0.7185159 0.6553086 0.1897778
> Sepal.Width -0.07699679 1.00000000 -0.1859944 -0.1571257 0.1439793
> Petal.Length 0.71851593 -0.18599442 1.0000000 0.8068907 0.2677154
> Petal.Width 0.65530856 -0.15712566 0.8068907 1.0000000 0.2724843
> Species 0.18977778 0.14397927 0.2677154 0.2724843 1.0000000
>
> To me, this is dangerous!
>
> Andy
>
>
> From: Gabor Grothendieck
>
> >
> > You can calculate the Kendall rank correlation with such a matrix
> > so you would not want to exclude factors in that case:
> >
> > > cor(iris, method = "kendall")
> > Sepal.Length Sepal.Width Petal.Length
> > Petal.Width Species
> > Sepal.Length 1.00000000 -0.07699679 0.7185159
> > 0.6553086 0.6704444
> > Sepal.Width -0.07699679 1.00000000 -0.1859944
> > -0.1571257 -0.3376144
> > Petal.Length 0.71851593 -0.18599442 1.0000000
> > 0.8068907 0.8229112
> > Petal.Width 0.65530856 -0.15712566 0.8068907
> > 1.0000000 0.8396874
> > Species 0.67044444 -0.33761438 0.8229112
> > 0.8396874 1.0000000
> >
> >
> > On Dec 3, 2007 9:27 AM, Michael Friendly <friendly_at_yorku.ca> wrote:
> > > In using cor(data.frame), it is annoying that you have to explicitly
> > > filter out non-numeric columns, and when you don't, the
> > error message
> > > is misleading:
> > >
> > > > cor(iris)
> > > Error in cor(iris) : missing observations in cov/cor
> > > In addition: Warning message:
> > > In cor(iris) : NAs introduced by coercion
> > >
> > > It would be nicer if stats:::cor() did the equivalent
> > *itself* of the
> > > following for a data.frame:
> > > > cor(iris[,sapply(iris, is.numeric)])
> > > Sepal.Length Sepal.Width Petal.Length Petal.Width
> > > Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
> > > Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
> > > Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
> > > Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
> > > >
> > >
> > > A change could be implemented here:
> > > if (is.data.frame(x))
> > > x <- as.matrix(x)
> > >
> > > Second, the default, use="all" throws an error if there are any
> > > NAs. It would be nicer if the default was use="complete.cases",
> > > which would generate warnings instead. Most other statistical
> > > software is more tolerant of missing data.
> > >
> > > > library(corrgram)
> > > > data(auto)
> > > > cor(auto[,sapply(auto, is.numeric)])
> > > Error in cor(auto[, sapply(auto, is.numeric)]) :
> > > missing observations in cov/cor
> > > > cor(auto[,sapply(auto, is.numeric)],use="complete")
> > > # works; output elided
> > >
> > > -Michael
> > >
> > > --
> > > Michael Friendly Email: friendly AT yorku DOT ca
> > > Professor, Psychology Dept.
> > > York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> > > 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> > > Toronto, ONT M3J 1P3 CANADA
> > >
> > > ______________________________________________
> > > R-help_at_r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attach...{{dropped:15}}



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 03 Dec 2007 - 19:09:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 03 Dec 2007 - 22:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.