Re: [R] How to take the average of multiple rows

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Sun, 15 Jun 2008 13:38:27 -0500

on 06/15/2008 01:18 PM ss wrote:
> Dear all,
>
>
> I have a matrix, called newdata1,
>

>> dim(newdata1)

> [1] 34176 83
>
> It looks like:
>
> EntrezID Name S1 S2 S3 S4 S5.....
> 1 4076 CAPRIN1 0.1 0.2 0.3...
> 2 139170 WDR40B 0.4 0.5 0.6...
> 3 5505 PPP1R2P1 0.3 0.3 0.7...
> 4 4076 CAPRIN1 0.7 0.3 0.2...
> 5 139170 WDR40B null 0.8 0.4...
> 6 139170 WDR40B null null 0.75...
>
>
> If there are rows whose "EntrezID" and "Name" are exactly the same,
> I want to take the average for these rows.
>
> There might be some 'null's in the data set. For example, there are
> three rows of the same "EntrezID" and "Name" as 139170 and WDR40B.
> For the sample called 'S1', there are three values, 0.4, null, null. For
> this
> scenario, I want to keep the final value as 0.4 For the sample 'S2', the
> values
> are 0.5, 0.8, null. For this, I want to ignore 'null', and take the average
> of
> 0.5 and 0.8, the final value is (0.5+0.8)/2=0.65. For the sample 'S3', there
> is no 'null', so just take the average as (0.6+0.4+0.75)/3=0.5833.
>
> Can you show me how to do this?
>
> I appreciate!

If your data file is exactly the way you have it above, you first want to convert the 'null' entries to NA so that they are treated as missing values by R.

Thus:

DF <- read.table(YourFileName, header = TRUE, na.strings = "null")

 > DF

   EntrezID     Name  S1  S2   S3
1     4076  CAPRIN1 0.1 0.2 0.30
2   139170   WDR40B 0.4 0.5 0.60
3     5505 PPP1R2P1 0.3 0.3 0.70
4     4076  CAPRIN1 0.7 0.3 0.20

5 139170 WDR40B NA 0.8 0.40
6 139170 WDR40B NA NA 0.75 Then use aggregate():

 > aggregate(DF[, -c(1:2)], by = list(DF$EntrezID, DF$Name),

             mean, na.rm = TRUE)
   Group.1  Group.2  S1   S2        S3
1    4076  CAPRIN1 0.4 0.25 0.2500000

2 5505 PPP1R2P1 0.3 0.30 0.7000000
3 139170 WDR40B 0.4 0.65 0.5833333 See ?read.table, ?aggregate and ?mean for more information. Take note of the 'na.rm' argument in ?mean.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 15 Jun 2008 - 18:42:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 15 Jun 2008 - 19:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive