Re: [R] how to ignore rows missing arguments of a function when creating a function?

From: Joris Meys <jorismeys_at_gmail.com>
Date: Wed, 09 Jun 2010 11:33:11 +0200

It's difficult to help you if we don't know what the data looks like. Two more tips :
- look at ?with instead of attach(). The latter causes a lot of trouble further on.
- use require() instead of library(). See also the help files on this.

I also wonder what you're doing with the na.rm=TRUE. I don't see how it affects the function, as it is used nowhere. Does it really remove NA?

This said: estfun takes the estimation function from your model object. Your model is essentially fit on all observations that are not NA, whereas I presume your cluster contains all observations, including NA. Now I don't know what kind of model fm represents, but normally there is information in the model object about the removed observations. I illustrate this using lm :

> x <- rnorm(100)
> y <- c(rnorm(90),NA,rnorm(9))
> test <- lm(x~y)
> str(test)

List of 13
 ... (tons of information)
 $ model :'data.frame': 99 obs. of 2 variables:

  ... (more tons of information)
  ..- attr(*, "na.action")=Class 'omit'  Named int 91
  .. .. ..- attr(*, "names")= chr "91"

Now we know that we can do :
> not <-attr(test$model,"na.action")
> y[-not]

So try this
### NOT TESTED
cl <- function(dat, na.rm = TRUE, fm, cluster){ require(sandwich)
require(lmtest)

not <- attr(fm$model,"na.action")
cluster <- cluster[-not]

with( dat ,{

    M <- length(unique(cluster))
    N <- length(cluster)
    K <- fm$rank

    dfc <- (M/(M-1))*((N-1)/(N-K))

    uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,     cluster, sum)) ) );

    vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

    coeftest(fm, vcovCL)
    } )
}

Cheers
Joris

On Wed, Jun 9, 2010 at 12:06 AM, edmund jones <edmund.j.jones_at_gmail.com> wrote:
> Hi,

>

> I am relatively new to R; when creating functions, I run into problems with
> missing values. I would like my functions to ignore rows with missing values
> for arguments of my function) in the analysis (as for example is the case in
> STATA). Note that I don't want my function to drop rows if there are missing
> arguments elsewhere in a row, ie for variables that are not arguments of my
> function.
>

> As an example: here is a clustering function I wrote:
>
>

> cl <- function(dat, na.rm = TRUE, fm, cluster){
>

> attach( dat , warn.conflicts = F)
>

> library(sandwich)
>

> library(lmtest)
>

> M <- length(unique(cluster))
>

> N <- length(cluster)
>

> K <- fm$rank
>

> dfc <- (M/(M-1))*((N-1)/(N-K))
>

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>

> coeftest(fm, vcovCL)
>

> }
>
>

> When I run my function, I get the message:
>
>

> Error in tapply(x, cluster, sum) : arguments must have same length
>
>

> If I specify instead attach(na.omit(dat), warn.conflicts = F)  and don't
> have the "na.rm=TRUE" argument, then my function runs; but only for the rows
> where there are no missing values AT ALL; however, I don't care if there are
> missing values for variables on which I am not applying my function.
>
>

> For example, I have information on children's size; if I want regress scores
> on age and parents' education, clustering on class, I would like missing
> values in size not to interfere (ie if I have scores, age, parents'
> education, and class, but not size, I don't want to drop this observation).
>
>

> I tried to look at the code of "lm" to see how the na.action part works, but
> I couldn't figure it out... This is exactly how I would like to deal with
> missing values.
>
>

> I tried to write
>

> cl <- function(dat, fm, cluster, na.action){
>

> attach( dat , warn.conflicts = F)
>

> library(sandwich)
>

> library(lmtest)
>

>  M <- length(unique(cluster))
>

>  N <- length(cluster)
>

>  K <- fm$rank
>

>  dfc <- (M/(M-1))*((N-1)/(N-K))
>

> uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x,
> cluster, sum)) ) );
>

> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
>

>  coeftest(fm, vcovCL)
>

> }
>

>  attr(cl,"na.action") <- na.exclude
>
>

> but it still didn't work...
>
>

> Any ideas of how to deal with this issue?
>

> Thank you for your answers!
>

> Edmund
>

>        [[alternative HTML version deleted]]
>

> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys_at_Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 09 Jun 2010 - 09:36:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 09 Jun 2010 - 09:50:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive