On Apr 3, 2011, at 3:46 PM, Tyler Rinker wrote:

Thanks David,

*>
After seeing the simplicity of your function versus the convoluted
mess I worked up I now understand why it's not necessary to have a
package to find NA's (and from what you said is a part of other
packages such as Hmisc already).
*

I'm actually not aware that any of the `describe` variants will return the indices of NA's. In the case of real dataset such an object could be fairly large. It was the other descriptive functions that I said were probably already coded.

*>
*

I am at the 2 1/2 month mark as an R user and have loads to learn.

Simpler is better. Thanks David for your time and I will take the
information you gave and put it to use in new situations.
*

You should also familiarize yourself with complete.cases() and the various functions that handle na.action parameters (linked from that help page). Note that complete.cases returns a logical vector (not the cases themselves) and is designed for indexing matrices or dataframes.

*>
Tyler
**>
*

**> >
**> >
On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:
**> >
**> > >
Quick question,
**> > >
I tried to find a function in available packages to find NA's
for an
entire data set (or single variables) and report the row of
missing
values (NA's for each column). I searched the typical routes
through the blogs and the help manuals for 15 minutes. Rather than
spend any more time searching I created my own function to do this
(probably in less time than it would have taken me to find the
function).
**> > >
Now I still have the same question: Is this function (NAhunter I
call it) already in existence? If so please direct me (because I'm
sure they've written better code more efficiently). I highly doubt
I'm this first person to want to find all the missing values in a
data set so I assume there is a function for it but I just didn't
spend enough time looking. If there is no existing function (big
if
here), is this something people feel is worthwhile for me to put
into a package of some sort?
**> >
I'm not sure that it would have occurred to people to include it
in a
package. Consider:
**> >
getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
**> >
> cities
long lat city pop
1 -58.38194 -34.59972 Buenos Aires NA
2 14.25000 40.83333 <NA> NA
> getNa(cities)
$long
integer(0)
**> >
$lat
integer(0)
**> >
$city
[1] 2
**> >
$pop
[1] 1 2
**> >
There are several packages with functions by the name `describe`
that
do most or all of rest of what you have proposed. I happen to use
Harrell's Hmisc but the other versions should also be reviewed if
you
want to avoid re-inventing the wheel.
**> > --
**> > David.
**> >
**> > >
Tyler
**> > >
Here's the code:
**> > >
NAhunter<-function(dataset)
{
find.NA<-function(variable)
{
if(is.numeric(variable)){
n<-length(variable)
mean<-mean(variable, na.rm=T)
median<-median(variable, na.rm=T)
sd<-sd(variable, na.rm=T)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-
data.frame(n,mean,median,sd,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing
Value'),]
list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
MISSING VALUES"=missing.values[,1])
}
else{
n<-length(variable)
NAs<-is.na(variable)
total.NA<-sum(NAs)
percent.missing<-total.NA/n
descriptives<-data.frame(n,total.NA,percent.missing)
rownames(descriptives)<-c(" ")
Case.Number<-1:n
Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
missing.value<-data.frame(Case.Number,Missing.Values)
missing.values<-missing.value[ which(Missing.Values=='Missing
Value'),]
list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
MISSING VALUES"=missing.values[,1])
}
}
dataset<-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}
**> > > [[alternative HTML version deleted]]
**> > >
**> >
*

