From: Bert Gunter <gunter.berton_at_gene.com>

Date: Mon, 31 Jan 2011 11:14:16 -0800

E-Mail: (Ted Harding) <ted.harding_at_wlandres.net> Fax-to-email: +44 (0)870 094 0861

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 31 Jan 2011 - 19:17:16 GMT

Date: Mon, 31 Jan 2011 11:14:16 -0800

Ted et.That solution is for (in "missing data language") MCAR (Missing Completely At Random), i.e. the probability of being missing does not depend on any of the variables in the data.

For MAR (Missing At Random), the probability of being missing may depend on the values of covariates but must not depend on the value of the outcome variable.

- Not quite right, I'm afraid. The Rubin-Little classification is that MAR means the distribution of the observed outcomes does not depend on the distribution of the missing values. In the common longitudinal setting, this means the distribution of the current missing outcome CAN depend on the distribution of past _observed_ outcomes but NOT on the (past or future) missing outcomes. For example, when missingness is due to dropout triggered by threshholds of observed disease progression, the data are MAR (and likelihood works, provided there are no shared parameters between the missing data and observed data likelihoods).

Cheers,

Bert

So the way to generate MAR, for data where there are covariates X1, X2, ... , Xk (and outcome Y) is to set up a function P (could be anything) of some or all of X1, X2, ... , Xk taking values in [0,1] (endpoints included), and then set a "missing" variable Z to be 0 (not missing) or 1 (missing) with probability given by the value of Z for that case.

So, if M is a data matrix with columns X1, ... , Xk , Y where each row is a case, use apply() to evaluate the function P() for each row in terms of (X1,X2,...,Xk).

You then get a vector p = c(p.1, p.2, ... , p.N) of values of P for the N rows of M. At this point:

Z <- 1*( runif(N) <= p )

creates a vectors of 0s and 1s which will be markers of Missing At Random.

Ted.

E-Mail: (Ted Harding) <ted.harding_at_wlandres.net> Fax-to-email: +44 (0)870 094 0861

Date: 31-Jan-11 Time: 10:17:20 ------------------------------ XFMail ------------------------------ ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

*--
*

Bert Gunter

Genentech Nonclinical Biostatistics

467-7374

http://devo.gene.com/groups/devo/depts/ncb/home.shtml

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 31 Jan 2011 - 19:17:16 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 31 Jan 2011 - 19:20:10 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*