[R] cor(data.frame) infelicities

From: Michael Friendly <friendly_at_yorku.ca>
Date: Mon, 03 Dec 2007 09:27:07 -0500


In using cor(data.frame), it is annoying that you have to explicitly filter out non-numeric columns, and when you don't, the error message is misleading:

> cor(iris)

Error in cor(iris) : missing observations in cov/cor In addition: Warning message:
In cor(iris) : NAs introduced by coercion

It would be nicer if stats:::cor() did the equivalent *itself* of the following for a data.frame:
> cor(iris[,sapply(iris, is.numeric)])

              Sepal.Length Sepal.Width Petal.Length Petal.Width

Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

>

A change could be implemented here:

     if (is.data.frame(x))
         x <- as.matrix(x)

Second, the default, use="all" throws an error if there are any NAs. It would be nicer if the default was use="complete.cases", which would generate warnings instead. Most other statistical software is more tolerant of missing data.

> library(corrgram)
> data(auto)
> cor(auto[,sapply(auto, is.numeric)])
Error in cor(auto[, sapply(auto, is.numeric)]) :

   missing observations in cov/cor
> cor(auto[,sapply(auto, is.numeric)],use="complete")
# works; output elided

-Michael

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 03 Dec 2007 - 14:30:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 03 Dec 2007 - 15:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.