From: David Winsemius <dwinsemius_at_comcast.net>

Date: Sun, 03 Apr 2011 17:44:55 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 03 Apr 2011 - 22:06:33 GMT

Date: Sun, 03 Apr 2011 17:44:55 -0400

On Apr 3, 2011, at 3:46 PM, Tyler Rinker wrote:

> aThanks David,

*>
**> After seeing the simplicity of your function versus the convoluted
**> mess I worked up I now understand why it's not necessary to have a
**> package to find NA's (and from what you said is a part of other
**> packages such as Hmisc already).
*

I'm actually not aware that any of the `describe` variants will return the indices of NA's. In the case of real dataset such an object could be fairly large. It was the other descriptive functions that I said were probably already coded.

*>
*

> I am at the 2 1/2 month mark as an R user and have loads to learn.

*> Simpler is better. Thanks David for your time and I will take the
**> information you gave and put it to use in new situations.
*

You should also familiarize yourself with complete.cases() and the various functions that handle na.action parameters (linked from that help page). Note that complete.cases returns a logical vector (not the cases themselves) and is designed for indexing matrices or dataframes.

*>
**> Tyler
**>
**> > CC: r-help_at_r-project.org
*

> > From: dwinsemius@comcast.net

*> > To: tyler_rinker_at_hotmail.com
**> > Subject: Re: [R] Function for finding NA's
**> > Date: Sun, 3 Apr 2011 14:19:40 -0400
**> >
**> >
**> > On Apr 3, 2011, at 1:44 PM, Tyler Rinker wrote:
**> >
**> > >
**> > > Quick question,
**> > >
**> > > I tried to find a function in available packages to find NA's
**> for an
**> > > entire data set (or single variables) and report the row of
**> missing
**> > > values (NA's for each column). I searched the typical routes
**> > > through the blogs and the help manuals for 15 minutes. Rather than
**> > > spend any more time searching I created my own function to do this
**> > > (probably in less time than it would have taken me to find the
**> > > function).
**> > >
**> > > Now I still have the same question: Is this function (NAhunter I
**> > > call it) already in existence? If so please direct me (because I'm
**> > > sure they've written better code more efficiently). I highly doubt
**> > > I'm this first person to want to find all the missing values in a
**> > > data set so I assume there is a function for it but I just didn't
**> > > spend enough time looking. If there is no existing function (big
**> if
**> > > here), is this something people feel is worthwhile for me to put
**> > > into a package of some sort?
**> >
**> > I'm not sure that it would have occurred to people to include it
**> in a
**> > package. Consider:
**> >
**> > getNa <- function(dfrm) lapply(dfrm, function(x) which(is.na(x) ) )
**> >
**> > > cities
**> > long lat city pop
**> > 1 -58.38194 -34.59972 Buenos Aires NA
**> > 2 14.25000 40.83333 <NA> NA
**> > > getNa(cities)
**> > $long
**> > integer(0)
**> >
**> > $lat
**> > integer(0)
**> >
**> > $city
**> > [1] 2
**> >
**> > $pop
**> > [1] 1 2
**> >
**> > There are several packages with functions by the name `describe`
**> that
**> > do most or all of rest of what you have proposed. I happen to use
**> > Harrell's Hmisc but the other versions should also be reviewed if
**> you
**> > want to avoid re-inventing the wheel.
**> > --
**> > David.
**> >
**> > >
**> > > Tyler
**> > >
**> > > Here's the code:
**> > >
**> > > NAhunter<-function(dataset)
**> > > {
**> > > find.NA<-function(variable)
**> > > {
**> > > if(is.numeric(variable)){
**> > > n<-length(variable)
**> > > mean<-mean(variable, na.rm=T)
**> > > median<-median(variable, na.rm=T)
**> > > sd<-sd(variable, na.rm=T)
**> > > NAs<-is.na(variable)
**> > > total.NA<-sum(NAs)
**> > > percent.missing<-total.NA/n
**> > > descriptives<-
**> data.frame(n,mean,median,sd,total.NA,percent.missing)
**> > > rownames(descriptives)<-c(" ")
**> > > Case.Number<-1:n
**> > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
**> > > missing.value<-data.frame(Case.Number,Missing.Values)
**> > > missing.values<-missing.value[ which(Missing.Values=='Missing
**> > > Value'),]
**> > > list("NUMERIC DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
**> > > MISSING VALUES"=missing.values[,1])
**> > > }
**> > > else{
**> > > n<-length(variable)
**> > > NAs<-is.na(variable)
**> > > total.NA<-sum(NAs)
**> > > percent.missing<-total.NA/n
**> > > descriptives<-data.frame(n,total.NA,percent.missing)
**> > > rownames(descriptives)<-c(" ")
**> > > Case.Number<-1:n
**> > > Missing.Values<-ifelse(NAs>0,"Missing Value"," ")
**> > > missing.value<-data.frame(Case.Number,Missing.Values)
**> > > missing.values<-missing.value[ which(Missing.Values=='Missing
**> > > Value'),]
**> > > list("CATEGORICAL DATA","DESCRIPTIVES"=t(descriptives),"CASE # OF
**> > > MISSING VALUES"=missing.values[,1])
**> > > }
**> > > }
**> > > dataset<-data.frame(dataset)
**> > > options(scipen=100)
**> > > options(digits=2)
**> > > lapply(dataset,find.NA)
**> > > }
**> > > [[alternative HTML version deleted]]
**> > >
**> > > ______________________________________________
**> > > R-help_at_r-project.org mailing list
**> > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> > > and provide commented, minimal, self-contained, reproducible code.
**> >
**> > David Winsemius, MD
**> > West Hartford, CT
**> >
*

David Winsemius, MD

West Hartford, CT

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 03 Apr 2011 - 22:06:33 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sun 03 Apr 2011 - 22:20:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*