[R] Subsetting a data frame by dropping correlated variables

From: Rita Carreira <ritacarreira_at_hotmail.com>
Date: Tue, 19 Apr 2011 19:10:34 +0000

Hello R Users!
I have a data frame that has many variables, some with missing observations, and some that are correlated with each other. I would like to subset the data by dropping one of the variables that is correlated with another variable that I will keep int he data frame. Alternatively, I could also drop both the variables that are correlated with each other. Worry not! I am not deleting data, I am just finding a subset of the data that I can use to impute some missing observations. I have tried the following statement
dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use = "pairwise.complete.obs", method ="pearson")<0.8)] but it gives me the following error:
Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use = "pairwise.complete.obs", :   undefined columns selected
Since I have several dozen data frames, it is impractical for me to manually inspect the correlation matrices and select which variables to drop, so I am trying to have R make the selection for me. Does any one have any idea on how to accomplish this? Thank you very much!

Rita ===================================== "If you think education is expensive, try ignorance."--Derek Bok

	[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Apr 2011 - 19:12:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Apr 2011 - 03:10:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive