From: Joshua Wiley <jwiley.psych_at_gmail.com>

Date: Fri, 25 Jun 2010 17:36:15 -0700

Date: Fri, 25 Jun 2010 17:36:15 -0700

On Fri, Jun 25, 2010 at 5:24 PM, Joris Meys <jorismeys_at_gmail.com> wrote:

> Just want to add that if you want to clean out the NA rows in a matrix

*> or data frame, take a look at ?complete.cases. Can be handy to use
**> with big datasets. I got curious, so I just ran the codes given here
**> on a big dataset, before and after removing NA rows. I have to be
**> honest, this is surely an illustration of the power of rowMeans. I'm
**> amazed myself.
*

I was too...the documentation (?rowMeans) wasn't joking:

"These functions are equivalent to use of 'apply' with 'FUN = mean' or 'FUN = sum' with appropriate margins, but are a lot faster."

*>
**> DF <- data.frame(
**> A=rep(DF$A,10000),
**> B=rep(DF$B,10000)
**> )
**>
*

>> system.time(apply(DF,1,mean,na.rm=TRUE))

*> user system elapsed
**> 13.26 0.06 13.46
**>
**>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
**> user system elapsed
**> 0.03 0.00 0.03
**>
**>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
**> + na.rm=TRUE)[,-1]))
**> + )
**>
**> Timing stopped at: 227.84 1.03 249.31 -- I got impatient and pressed the escape
**>
**>> DF <- DF[complete.cases(DF),]
**>
**>> system.time(apply(DF,1,mean,na.rm=TRUE))
**> user system elapsed
**> 0.39 0.00 0.39
**>
**>> system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
**> user system elapsed
**> 0.01 0.00 0.02
**>
**>> system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
**> + na.rm=TRUE)[,-1]))
**> + )
**> user system elapsed
**> 10.01 0.07 13.40
**>
**> Cheers
**> Joris
**>
**>
**> On Sat, Jun 26, 2010 at 1:08 AM, emorway <emorway_at_engr.colostate.edu> wrote:
**>>
**>> Forum,
**>>
**>> Using the following data:
**>>
**>> DF<-read.table(textConnection("A B
**>> 22.60 NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> 102.00 NA
**>> 19.20 NA
**>> 19.20 NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> 11.80 NA
**>> 7.62 NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> NA NA
**>> 75.00 NA
**>> NA NA
**>> 18.30 18.2
**>> NA NA
**>> NA NA
**>> 8.44 NA
**>> 18.00 NA
**>> NA NA
**>> 12.90 NA"),header=T)
**>> closeAllConnections()
**>>
**>> The second column is a duplicate reading of the first column, and when two
**>> values are available, I would like to average column 1 and 2 (example code
**>> below). But if there is only one reading, I would like to retain it, but I
**>> haven't found a good way to exclude NA's using the following code:
**>>
**>> t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))
**>>
**>> Currently, row 24 is the only row with a returned value. I'd like the
**>> result to return column "A" if it is the only available value, and average
**>> where possible. Of course, if both columns are NA, NA is the only possible
**>> result.
**>>
**>> The result I'm after would look like this (row 24 is an avg):
**>>
**>> 22.60
**>> NA
**>> NA
**>> NA
**>> NA
**>> NA
**>> NA
**>> NA
**>> 102.00
**>> 19.20
**>> 19.20
**>> NA
**>> NA
**>> NA
**>> 11.80
**>> 7.62
**>> NA
**>> NA
**>> NA
**>> NA
**>> NA
**>> 75.00
**>> NA
**>> 18.25
**>> NA
**>> NA
**>> 8.44
**>> 18.00
**>> NA
**>> 12.90
**>>
**>> This is a small example from a much larger data frame, so if you're
**>> wondering what the deal is with list(), that will come into play for the
**>> larger problem I'm trying to solve.
**>>
**>> Respectfully,
**>> Eric
**>> --
**>> View this message in context: http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
**>> Sent from the R help mailing list archive at Nabble.com.
**>>
**>> ______________________________________________
**>> R-help_at_r-project.org mailing list
**>> https://stat.ethz.ch/mailman/listinfo/r-help
**>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**>> and provide commented, minimal, self-contained, reproducible code.
**>>
**>
**>
**>
**> --
**> Joris Meys
**> Statistical consultant
**>
**> Ghent University
**> Faculty of Bioscience Engineering
**> Department of Applied mathematics, biometrics and process control
**>
**> tel : +32 9 264 59 87
**> Joris.Meys_at_Ugent.be
**> -------------------------------
**> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Sat 26 Jun 2010 - 00:42:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 26 Jun 2010 - 01:10:35 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*