From: Joris Meys <jorismeys_at_gmail.com>

Date: Wed, 09 Jun 2010 11:33:11 +0200

> I am relatively new to R; when creating functions, I run into problems with

*> missing values. I would like my functions to ignore rows with missing values
*

*> for arguments of my function) in the analysis (as for example is the case in
*

*> STATA). Note that I don't want my function to drop rows if there are missing
*

*> arguments elsewhere in a row, ie for variables that are not arguments of my
*

*> function.
*

> As an example: here is a clustering function I wrote:

> cl <- function(dat, na.rm = TRUE, fm, cluster){

> attach( dat , warn.conflicts = F)

> library(sandwich)

> library(lmtest)

> M <- length(unique(cluster))

> N <- length(cluster)

> K <- fm$rank

> dfc <- (M/(M-1))*((N-1)/(N-K))

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,

> cluster, sum)) ) );

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

> coeftest(fm, vcovCL)

*> }
*

> When I run my function, I get the message:

> Error in tapply(x, cluster, sum) : arguments must have same length

> If I specify instead attach(na.omit(dat), warn.conflicts = F) and don't

*> have the "na.rm=TRUE" argument, then my function runs; but only for the rows
*

*> where there are no missing values AT ALL; however, I don't care if there are
*

*> missing values for variables on which I am not applying my function.
*

> For example, I have information on children's size; if I want regress scores

*> on age and parents' education, clustering on class, I would like missing
*

*> values in size not to interfere (ie if I have scores, age, parents'
*

*> education, and class, but not size, I don't want to drop this observation).
*

> I tried to look at the code of "lm" to see how the na.action part works, but

*> I couldn't figure it out... This is exactly how I would like to deal with
*

*> missing values.
*

> I tried to write

> cl <- function(dat, fm, cluster, na.action){

> attach( dat , warn.conflicts = F)

> library(sandwich)

> library(lmtest)

> M <- length(unique(cluster))

> N <- length(cluster)

> K <- fm$rank

> dfc <- (M/(M-1))*((N-1)/(N-K))

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,

> cluster, sum)) ) );

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

> coeftest(fm, vcovCL)

*> }
*

> attr(cl,"na.action") <- na.exclude

> but it still didn't work...

> Any ideas of how to deal with this issue?

> Thank you for your answers!

> Edmund

> [[alternative HTML version deleted]]

*> R-help_at_r-project.org mailing list
*

*> https://stat.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
*

*> and provide commented, minimal, self-contained, reproducible code.
*

>

Date: Wed, 09 Jun 2010 11:33:11 +0200

It's difficult to help you if we don't know what the data looks like.
Two more tips :

- look at ?with instead of attach(). The latter causes a lot of
trouble further on.

- use require() instead of library(). See also the help files on this.

I also wonder what you're doing with the na.rm=TRUE. I don't see how it affects the function, as it is used nowhere. Does it really remove NA?

This said: estfun takes the estimation function from your model object. Your model is essentially fit on all observations that are not NA, whereas I presume your cluster contains all observations, including NA. Now I don't know what kind of model fm represents, but normally there is information in the model object about the removed observations. I illustrate this using lm :

*> x <- rnorm(100)
**> y <- c(rnorm(90),NA,rnorm(9))
*

> test <- lm(x~y)

> str(test)

List of 13

... (tons of information)

$ model :'data.frame': 99 obs. of 2 variables:

... (more tons of information) ..- attr(*, "na.action")=Class 'omit' Named int 91 .. .. ..- attr(*, "names")= chr "91"

- attr(*, "class")= chr "lm"

Now we know that we can do :

> not <-attr(test$model,"na.action")

> y[-not]

So try this

**### NOT TESTED
**

cl <- function(dat, na.rm = TRUE, fm, cluster){
require(sandwich)

require(lmtest)

not <- attr(fm$model,"na.action")

cluster <- cluster[-not]

M <- length(unique(cluster)) N <- length(cluster) K <- fm$rank

dfc <- (M/(M-1))*((N-1)/(N-K))

uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x, cluster, sum)) ) );

vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

coeftest(fm, vcovCL)

} )

}

Cheers

Joris

On Wed, Jun 9, 2010 at 12:06 AM, edmund jones <edmund.j.jones_at_gmail.com> wrote:

*> Hi,
*

>

> I am relatively new to R; when creating functions, I run into problems with

>

> As an example: here is a clustering function I wrote:

> >

> cl <- function(dat, na.rm = TRUE, fm, cluster){

>

> attach( dat , warn.conflicts = F)

>

> library(sandwich)

>

> library(lmtest)

>

> M <- length(unique(cluster))

>

> N <- length(cluster)

>

> K <- fm$rank

>

> dfc <- (M/(M-1))*((N-1)/(N-K))

>

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,

> cluster, sum)) ) );

>

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

>

> coeftest(fm, vcovCL)

>

> >

> When I run my function, I get the message:

> >

> Error in tapply(x, cluster, sum) : arguments must have same length

> >

> If I specify instead attach(na.omit(dat), warn.conflicts = F) and don't

> >

> For example, I have information on children's size; if I want regress scores

> >

> I tried to look at the code of "lm" to see how the na.action part works, but

> >

> I tried to write

>

> cl <- function(dat, fm, cluster, na.action){

>

> attach( dat , warn.conflicts = F)

>

> library(sandwich)

>

> library(lmtest)

>

> M <- length(unique(cluster))

>

> N <- length(cluster)

>

> K <- fm$rank

>

> dfc <- (M/(M-1))*((N-1)/(N-K))

>

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,

> cluster, sum)) ) );

>

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

>

> coeftest(fm, vcovCL)

>

>

> attr(cl,"na.action") <- na.exclude

> >

> but it still didn't work...

> >

> Any ideas of how to deal with this issue?

>

> Thank you for your answers!

>

> Edmund

>

> [[alternative HTML version deleted]]

>

> ______________________________________________

>

-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys_at_Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Wed 09 Jun 2010 - 09:36:06 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 09 Jun 2010 - 09:50:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*