Re: [R] How do I delete multiple blank variables from a data frame?

From: Allan Engelhardt <allane_at_cybaea.com>
Date: Sat, 19 Mar 2011 08:36:43 +0000

On 19/03/11 01:35, Joshua Wiley wrote:
> Hi Rita,
>
> This is far from the most efficient or elegant way, but:
>
> ## two column data frame, one all NAs
> d<- data.frame(1:10, NA)
> ## use apply to create logical vector and subset d
> d[, apply(d, 2, function(x) !all(is.na(x)))]

This works, but apply converts d to a matrix which is not needed, so try

d[, sapply(d, function(x) !all(is.na(x)))]

if performance is an issue (apply is about 3x slower on your test data frame d, more for larger data frames).

For the related problem of removing columns of constant-or-na values, the best I could come up with is

zv.1 <- function(x) {

     ## The literal approach
     y <- var(x, na.rm = TRUE)
     return(is.na(y) || y == 0)

}
sapply(train, zv.1)

See
http://www.cybaea.net/Blogs/Data/R-Eliminating-observed-values-with-zero-variance.html for the benchmarks.

Allan

> I am just apply()ing to each column (the 2) of d, the function
> !all(is.na(x)) which will return FALSE if all of x is missing and TRUE
> otherwise. The result is a logical vector the same length as the
> number of columns in d that is used to subset only the d columns with
> at least some non-missing values. For documentation see:
>
> ?apply
> ?is.na
> ?all
> ?"["
> ?Logic
>
> HTH,
>
> Josh
>
> On Fri, Mar 18, 2011 at 3:35 PM, Rita Carreira<ritacarreira_at_hotmail.com> wrote:
>> Dear List Members,I have 55 data frames, each of which with 272 variables and 267 observations. Some of these variables are blanks but the blanks are not the same for every data frame. I would like to write a procedure in which I import a data frame, see which variables are blank, and delete those variables. My data frames have variables named P1 to P136 and Q1 to Q136.
>> I have a couple of questions regarding this issue:
>> 1) Is a loop an efficient way to address this problem? If not, what are my alternatives and how do I implement them?2) I have been playing with a single data frame to try to figure out a way of having R go through the columns and see which ones it should delete. I have figured out how to delete rows with missing data (newdata<- na.omit(olddata)) but how do I do it for columns???
>> Thank you very much for your help and have a great weekend!
>> Rita ________________________________________ "If you think education is expensive, try ignorance"--Derek Bok
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>>
https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 19 Mar 2011 - 08:40:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Mar 2011 - 20:10:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive