[R] use rowSums or colSums instead of apply!

From: Tim Hesterberg <timh_at_insightful.com>
Date: Tue, 19 Feb 2008 15:50:43 -0800

There were two queries recently regarding removing rows or columns that have all NAs.

Three respondents suggested combinations of apply() with any() or all().

I cringe when I see apply() used unnecessarily. Using rowSums() or colSums() is much faster, and gives more readable code. (Two respondents did suggest colSums for the second query.)

# original small data frame

df <- data.frame(col1=c(1:3,NA,NA,4),col2=c(7:9,NA,NA,NA),col3=c(2:4,NA,NA,4)) system.time( for(i in 1:10^4) temp <- rowSums(is.na(df)) < 3)
# .078

system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x))))
# 3.33

# larger data frame

x <- matrix(runif(10^5), 10^3)
x[ runif(10^5) < .99 ] <- NA
df2 <- data.frame(x)
system.time( for(i in 1:100) temp <- rowSums(is.na(df2)) < 100)
# .34

system.time( for(i in 1:10^4) temp <- apply(df,1,function(x)any(!is.na(x))))
# 3.34

Tim Hesterberg

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Feb 2008 - 23:53:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Feb 2008 - 00:30:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive