# [R] Function for deleting variables with >=50% missing obs from a data frame

From: Rita Carreira <ritacarreira_at_hotmail.com>
Date: Fri, 15 Apr 2011 22:00:10 +0000

Hello R users!
I have several data frames where some of the variables have many missing observations. For example, Q1 in one of my data frames has over 66% of its observations missing. I have tried imputation with mice but it does not work for all the data frames and I get the following message or a similar message to this:  iter imp variable
1 1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q19 Q36 Q47 Q52 Q79 Q80 Q94 Q97 Q104 Q108 Q122 Q131 Q134 P1 P2 P3 P4 P5 P6Error in solve.default(xtx + diag(pen)) :   system is computationally singular: reciprocal condition number = 1.83044e-16 In addition: Warning messages:
1: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced ...
7: In sqrt((sum(residuals^2))/(sum(ry) - ncol(x) - 1)) : NaNs produced Note: warnings 2 to 6 suppressed by me.
I would like to try a different approach where I delete the variables that have more than 50% missing observations from the data frame (well, the actual percentage might change). I have already deleted from the data frame the variables that were all missing and for this I used the following code, which was kindly suggested by one of you: ## Data frame after removing any blank columns:dfQ <- dfQtemp[ , sapply(dfQtemp, function(x) !all(is.na(x)))]  Any ideas or suggestons for deleting variables with partially missing data? Thanks and have a great weekend!

```Rita ===================================== "If you think education is expensive, try ignorance."--Derek Bok

[[alternative HTML version deleted]]

______________________________________________
```
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 15 Apr 2011 - 22:03:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Apr 2011 - 22:10:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.