From: edmund jones <edmund.j.jones_at_gmail.com>

Date: Wed, 09 Jun 2010 00:06:23 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Jun 2010 - 22:08:33 GMT

Date: Wed, 09 Jun 2010 00:06:23 +0200

Hi,

I am relatively new to R; when creating functions, I run into problems with missing values. I would like my functions to ignore rows with missing values for arguments of my function) in the analysis (as for example is the case in STATA). Note that I don't want my function to drop rows if there are missing arguments elsewhere in a row, ie for variables that are not arguments of my function.

As an example: here is a clustering function I wrote:

cl <- function(dat, na.rm = TRUE, fm, cluster){

attach( dat , warn.conflicts = F)

library(sandwich)

library(lmtest)

M <- length(unique(cluster))

N <- length(cluster)

K <- fm$rank

dfc <- (M/(M-1))*((N-1)/(N-K))

uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x, cluster, sum)) ) );

vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

coeftest(fm, vcovCL)

*}
*

When I run my function, I get the message:

Error in tapply(x, cluster, sum) : arguments must have same length

If I specify instead attach(na.omit(dat), warn.conflicts = F) and don't have the "na.rm=TRUE" argument, then my function runs; but only for the rows where there are no missing values AT ALL; however, I don't care if there are missing values for variables on which I am not applying my function.

For example, I have information on children's size; if I want regress scores on age and parents' education, clustering on class, I would like missing values in size not to interfere (ie if I have scores, age, parents' education, and class, but not size, I don't want to drop this observation).

I tried to look at the code of "lm" to see how the na.action part works, but I couldn't figure it out... This is exactly how I would like to deal with missing values.

I tried to write

cl <- function(dat, fm, cluster, na.action){

attach( dat , warn.conflicts = F)

library(sandwich)

library(lmtest)

M <- length(unique(cluster))

N <- length(cluster)

K <- fm$rank

dfc <- (M/(M-1))*((N-1)/(N-K))

uj <- data.frame(apply(estfun(fm),2, function(x) data.frame(tapply(x, cluster, sum)) ) );

vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)

coeftest(fm, vcovCL)

*}
*

attr(cl,"na.action") <- na.exclude

Any ideas of how to deal with this issue?

Edmund

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Jun 2010 - 22:08:33 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 09 Jun 2010 - 09:40:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*