# Re: [R] Problem with NA data when computing standard error

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Tue, 08 Apr 2008 16:00:57 -0500

On Tue, Apr 8, 2008 at 12:44 PM, LeCzar <sirnixu_at_gmail.com> wrote:
>
> Hey,
>
> I want to compute means and standard errors as two tables like this:
>
> se<-function(x)sqrt(var(x)/length(x))
>
>

The missings are not your main problem.

The command var computes the variance-covariance matrix. Some covariance values can be negative. Trying to take square roots is a mistake.

For example, run

> example(var)

to get some matrices to work with.

> C1[3,4] <- NA
> C1[3,5] <- NA

Observe you can calculate

> var(C1, na.rm=T)

but you cannot take sqrt of that because it would try to apply sqrt to negative values.

To get the standard errors, it is necessary to reconsider the problem, do something like

> diag(var(C1, na.rm=T))

That will give the diagonals, which are positive, so

> sqrt(diag(var(C1, na.rm=T)))

Works as well.

But you have the separate problem of dividing each one by the square root of the length, and since there are missings that is not the same for every column. Maybe somebody knows a smarter way, but this appears to give the correct answer:

validX <- colSums( ! is.na(C1))

This gives the roots:

sqrt(validX)

Put that together, it seems to me you could try

se <- function(x) {

myDiag <- sqrt(diag(var(x, na.rm=T)))

validX <- colSums(! is.na(x))

myDiag/sqrt(validX)
}

That works for me:

> se(C1)

```       Fertility      Agriculture      Examination        Education
50.740226       110.808614        39.390611        39.303898
Catholic Infant.Mortality
328.272207         4.513863

```
```--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help