[R] counting Na/not NA by groups by column

From: steven mosher <moshersteven_at_gmail.com>
Date: Wed, 09 Jun 2010 18:03:09 -0700


# create a matrix with some random NAs in it
> m<-matrix(NA,nrow=15,ncol=14)
> m[,3:14]<-52
> m[13,9]<-NA
> m[4:7,8]<-NA
> m[1:2,5]<-NA
> m[,2]<-rep(1800:1804, by=3)
> y<-order(m[,2])
> m<-m[y,]
> m[,1]<-rep(1:3,by=5)
> m

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,] 1 1800 52 52 NA 52 52 52 52 52 52 52 52  52
 [2,] 2 1800 52 52 52 52 52 NA 52 52 52 52 52  52
 [3,] 3 1800 52 52 52 52 52 52 52 52 52 52 52  52
 [4,] 1 1801 52 52 NA 52 52 52 52 52 52 52 52  52
 [5,] 2 1801 52 52 52 52 52 NA 52 52 52 52 52  52
 [6,] 3 1801 52 52 52 52 52 52 52 52 52 52 52  52
 [7,] 1 1802 52 52 52 52 52 52 52 52 52 52 52  52
 [8,] 2 1802 52 52 52 52 52 52 52 52 52 52 52  52
 [9,] 3 1802 52 52 52 52 52 52 NA 52 52 52 52  52
[10,] 1 1803 52 52 52 52 52 NA 52 52 52 52 52  52
[11,] 2 1803 52 52 52 52 52 52 52 52 52 52 52  52
[12,] 3 1803 52 52 52 52 52 52 52 52 52 52 52  52
[13,] 1 1804 52 52 52 52 52 NA 52 52 52 52 52  52
[14,] 2 1804 52 52 52 52 52 52 52 52 52 52 52  52
[15,] 3 1804 52 52 52 52 52 52 52 52 52 52 52  52

# the goal is to count all NON NA by changes in column 2 # we can get the count for all rows easily.
> col.sum<-(apply(!is.na(m[,3:14]),2,sum))
> col.sum

 [1] 15 15 13 15 15 11 14 15 15 15 15 15

# what we want is a result that looks like this

   1800 3 3 2 3 3 2 3 3 3 3 3 3    1801 3 3 2 3 3 2 3 3 3 3 3 3    1802 3 3 3 3 3 3 2 3 3 3 3 3    1803 3 3 3 3 3 2 3 3 3 3 3 3    1804 3 3 3 3 3 2 3 3 3 3 3 3

I've toyed a bit with By

> mask<-!is.na(m[,3:14])
> test<-cbind(m[,1:2],mask)
> test

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
 [1,] 1 1800 1 1 0 1 1 1 1 1 1 1 1   1
 [2,] 2 1800 1 1 1 1 1 0 1 1 1 1 1   1
 [3,] 3 1800 1 1 1 1 1 1 1 1 1 1 1   1
 [4,] 1 1801 1 1 0 1 1 1 1 1 1 1 1   1
 [5,] 2 1801 1 1 1 1 1 0 1 1 1 1 1   1
 [6,] 3 1801 1 1 1 1 1 1 1 1 1 1 1   1
 [7,] 1 1802 1 1 1 1 1 1 1 1 1 1 1   1
 [8,] 2 1802 1 1 1 1 1 1 1 1 1 1 1   1
 [9,] 3 1802 1 1 1 1 1 1 0 1 1 1 1   1
[10,] 1 1803 1 1 1 1 1 0 1 1 1 1 1   1
[11,] 2 1803 1 1 1 1 1 1 1 1 1 1 1   1
[12,] 3 1803 1 1 1 1 1 1 1 1 1 1 1   1
[13,] 1 1804 1 1 1 1 1 0 1 1 1 1 1   1
[14,] 2 1804 1 1 1 1 1 1 1 1 1 1 1   1
[15,] 3 1804 1 1 1 1 1 1 1 1 1 1 1   1

> result<-by(test[,3:14],test[,2], sum)
> result

INDICES: 1800
[1] 34



INDICES: 1801
[1] 34

INDICES: 1802
[1] 35

INDICES: 1803
[1] 35

INDICES: 1804
[1] 35
>

as this sums all the values and not by column. it's wrong  so is there an elegant way to get the number of NON Nas.. by column governed by changes in the values of a variable.

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 Jun 2010 - 01:06:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Jun 2010 - 13:10:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive