Re: [R] Problem with NA data when computing standard error

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Tue, 08 Apr 2008 16:00:57 -0500

On Tue, Apr 8, 2008 at 12:44 PM, LeCzar <sirnixu_at_gmail.com> wrote:
>
> Hey,
>
> I want to compute means and standard errors as two tables like this:
>
> se<-function(x)sqrt(var(x)/length(x))
>
>

The missings are not your main problem.

The command var computes the variance-covariance matrix. Some covariance values can be negative. Trying to take square roots is a mistake.

For example, run

> example(var)

to get some matrices to work with.

> C1[3,4] <- NA
> C1[3,5] <- NA

Observe you can calculate

> var(C1, na.rm=T)

but you cannot take sqrt of that because it would try to apply sqrt to negative values.

To get the standard errors, it is necessary to reconsider the problem, do something like

> diag(var(C1, na.rm=T))

That will give the diagonals, which are positive, so

> sqrt(diag(var(C1, na.rm=T)))

Works as well.

But you have the separate problem of dividing each one by the square root of the length, and since there are missings that is not the same for every column. Maybe somebody knows a smarter way, but this appears to give the correct answer:

validX <- colSums( ! is.na(C1))

This gives the roots:

sqrt(validX)

Put that together, it seems to me you could try

se <- function(x) {

    myDiag <- sqrt(diag(var(x, na.rm=T)))

     validX <- colSums(! is.na(x))

     myDiag/sqrt(validX)
}

That works for me:

> se(C1)

       Fertility      Agriculture      Examination        Education
       50.740226       110.808614        39.390611        39.303898
        Catholic Infant.Mortality
      328.272207         4.513863


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 08 Apr 2008 - 21:52:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Apr 2008 - 22:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive