Re: [R] Simple Missing cases Function

From: Tim Elwell-Sutton <tesutton_at_hku.hk>
Date: Tue, 19 Apr 2011 16:18:45 +0800

Thanks Tyler

This function has some useful features

Tim  

From: Tyler Rinker [mailto:tyler_rinker_at_hotmail.com] Sent: Tuesday, April 19, 2011 3:52 PM
To: tesutton; r-help_at_r-project.org
Subject: RE: [R] Simple Missing cases Function  

I use the following code/function which gives me some quick descriptives about each variable (ie. n of missing values, % missing, case #'s missing, etc.):
Fairly quick, maybe not pretty but effective on either single variables or entire data sets.  

NAhunter<-function(dataset)
{
find.NA<-function(variable)
{
if(is.numeric(variable)){
n<-length(variable)
mean<-mean(variable, na.rm=T)
median<-median(variable, na.rm=T)
sd<-sd(variable, na.rm=T)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing) rownames(descriptives)<-c(" ")
Case.Number<-1:n

Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing Value'),]
list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1])
}
else{
n<-length(variable)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,total.NA,percent.missing) rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing Value'),]
list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF MISSING VALUES"=missing.values[,1])
}
}
dataset<-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}  

> From: tesutton@hku.hk
> To: r-help_at_r-project.org
> Date: Tue, 19 Apr 2011 15:29:08 +0800
> Subject: [R] Simple Missing cases Function
>
> Dear all
>
>
>
> I have written a function to perform a very simple but useful task which I
> do regularly. It is designed to show how many values are missing from each
> variable in a data.frame. In its current form it works but is slow because
I
> have used several loops to achieve this simple task.
>
>
>
> Can anyone see a more efficient way to get the same results? Or is there
> existing function which does this?
>
>
>
> Thanks for your help
>
> Tim
>
>
>
> Function:
>
> miss <- function (data)
>
> {
>
> miss.list <- list(NA)
>
> for (i in 1:length(data)) {
>
> miss.list[[i]] <- table(is.na(data[i]))
>
> }
>
> for (i in 1:length(miss.list)) {
>
> if (length(miss.list[[i]]) == 2) {
>
> miss.list[[i]] <- miss.list[[i]][2]
>
> }
>
> }
>
> for (i in 1:length(miss.list)) {
>
> if (names(miss.list[[i]]) == "FALSE") {
>
> miss.list[[i]] <- 0
>
> }
>
> }
>
> data.frame(names(data), as.numeric(miss.list))
>
> }
>
>
>
> Example:
>
> data(ToothGrowth)
>
> data.m <- ToothGrowth
>
> data.m$supp[sample(1:nrow(data.m), size=25)] <- NA
>
> miss(data.m)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Apr 2011 - 08:21:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Apr 2011 - 08:50:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive