Re: [R] Average 2 Columns when possible, or return available value

From: Joshua Wiley <jwiley.psych_at_gmail.com>
Date: Fri, 25 Jun 2010 17:36:15 -0700

On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys <jorismeys_at_gmail.com> wrote:
> Just want to add that if you want to clean out the NA rows in a matrix
> or data frame, take a look at ?complete.cases. Can be handy to use
> with big datasets. I got curious, so I just ran the codes given here
> on a big dataset, before and after removing NA rows. I have to be
> honest, this is surely an illustration of the power of rowMeans. I'm
> amazed myself.

I was too...the documentation (?rowMeans) wasn't joking:

"These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster."

>
> DF <- data.frame(
>  A=rep(DF$A,10000),
>  B=rep(DF$B,10000)
> )
>
>> system.time(apply(DF,1,mean,na.rm=TRUE))
>   user  system elapsed
>  13.26    0.06   13.46
>
>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
>   user  system elapsed
>   0.03    0.00    0.03
>
>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
> + na.rm=TRUE)[,-1]))
> + )
>
> Timing stopped at: 227.84 1.03 249.31  -- I got impatient and pressed the escape
>
>> DF <- DF[complete.cases(DF),]
>
>> system.time(apply(DF,1,mean,na.rm=TRUE))
>   user  system elapsed
>   0.39    0.00    0.39
>
>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
>   user  system elapsed
>   0.01    0.00    0.02
>
>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
> + na.rm=TRUE)[,-1]))
> + )
>   user  system elapsed
>  10.01    0.07   13.40
>
> Cheers
> Joris
>
>
> On Sat, Jun 26, 2010 at 1:08 AM, emorway <emorway_at_engr.colostate.edu> wrote:
>>
>> Forum,
>>
>> Using the following data:
>>
>> DF<-read.table(textConnection("A B
>> 22.60 NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>> 102.00 NA
>>  19.20 NA
>>  19.20 NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  11.80 NA
>>  7.62 NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  NA NA
>>  75.00 NA
>>  NA NA
>>  18.30 18.2
>>  NA NA
>>  NA NA
>>  8.44 NA
>>  18.00 NA
>>  NA NA
>>  12.90 NA"),header=T)
>> closeAllConnections()
>>
>> The second column is a duplicate reading of the first column, and when two
>> values are available, I would like to average column 1 and 2 (example code
>> below).  But if there is only one reading, I would like to retain it, but I
>> haven't found a good way to exclude NA's using the following code:
>>
>> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))
>>
>> Currently, row 24 is the only row with a returned value.  I'd like the
>> result to return column "A" if it is the only available value, and average
>> where possible.  Of course, if both columns are NA, NA is the only possible
>> result.
>>
>> The result I'm after would look like this (row 24 is an avg):
>>
>>  22.60
>>    NA
>>    NA
>>    NA
>>    NA
>>    NA
>>    NA
>>    NA
>> 102.00
>>  19.20
>>  19.20
>>    NA
>>    NA
>>    NA
>>  11.80
>>  7.62
>>    NA
>>    NA
>>    NA
>>    NA
>>    NA
>>  75.00
>>    NA
>>  18.25
>>    NA
>>    NA
>>  8.44
>>  18.00
>>    NA
>>  12.90
>>
>> This is a small example from a much larger data frame, so if you're
>> wondering what the deal is with list(), that will come into play for the
>> larger problem I'm trying to solve.
>>
>> Respectfully,
>> Eric
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> Joris.Meys_at_Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 26 Jun 2010 - 00:42:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 26 Jun 2010 - 01:10:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive