From: David Winsemius <dwinsemius_at_comcast.net>

Date: Sun, 03 Apr 2011 14:19:40 -0400

Date: Sun, 03 Apr 2011 14:19:40 -0400

On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:

*>
*

> Quick question,

*>
**> I tried to find a function in available packages to find NA's for an
**> entire data set (or single variables) and report the row of missing
**> values (NA's for each column). I searched the typical routes
**> through the blogs and the help manuals for 15 minutes. Rather than
**> spend any more time searching I created my own function to do this
**> (probably in less time than it would have taken me to find the
**> function).
**>
**> Now I still have the same question: Is this function (NAhunter I
**> call it) already in existence? If so please direct me (because I'm
**> sure they've written better code more efficiently). I highly doubt
**> I'm this first person to want to find all the missing values in a
**> data set so I assume there is a function for it but I just didn't
**> spend enough time looking. If there is no existing function (big if
**> here), is this something people feel is worthwhile for me to put
**> into a package of some sort?
*

I'm not sure that it would have occurred to people to include it in a package. Consider:

getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )

> cities

long lat city pop

1 -58.38194 -34.59972 Buenos Aires NA

2 14.25000 40.83333 <NA> NA

> getNa(cities)

$long

integer(0)

$lat

integer(0)

$city

[1] 2

$pop

[1] 1 2

There are several packages with functions by the name `describe` that do most or all of rest of what you have proposed. I happen to use Harrell's Hmisc but the other versions should also be reviewed if you want to avoid re-inventing the wheel.

-- David.Received on Sun 03 Apr 2011 - 18:31:53 GMT

>

> Tyler

>> Here's the code:>> NAhunter<-function(dataset)> {> find.NA<-function(variable)> {> if(is.numeric(variable)){> n<-length(variable)> mean<-mean(variable, na.rm=T)> median<-median(variable, na.rm=T)> sd<-sd(variable, na.rm=T)> NAs<-is.na(variable)> total.NA<-sum(NAs)> percent.missing<-total.NA/n> descriptives<-data.frame(n,mean,median,sd,total.NA,percent.missing)> rownames(descriptives)<-c(" ")> Case.Number<-1:n> Missing.Values<-ifelse(NAs>0,"Missing Value"," ")> missing.value<-data.frame(Case.Number,Missing.Values)> missing.values<-missing.value[ which(Missing.Values=='Missing> Value'),]> list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF> MISSING VALUES"=missing.values[,1])> }> else{> n<-length(variable)> NAs<-is.na(variable)> total.NA<-sum(NAs)> percent.missing<-total.NA/n> descriptives<-data.frame(n,total.NA,percent.missing)> rownames(descriptives)<-c(" ")> Case.Number<-1:n> Missing.Values<-ifelse(NAs>0,"Missing Value"," ")> missing.value<-data.frame(Case.Number,Missing.Values)> missing.values<-missing.value[ which(Missing.Values=='Missing> Value'),]> list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF> MISSING VALUES"=missing.values[,1])> }> }> dataset<-data.frame(dataset)> options(scipen=100)> options(digits=2)> lapply(dataset,find.NA)> }> [[alternative HTML version deleted]]>> ______________________________________________> R-help_at_r-project.org mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD West Hartford, CT ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sun 03 Apr 2011 - 20:40:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*