From: Eric Archer <Eric.Archer_at_noaa.gov>

Date: Thu 04 May 2006 - 04:28:34 EST

Date: Thu 04 May 2006 - 04:28:34 EST

R-users,

I'm seeking any suggestions on optimizing some code for speed. Here's the setup: the code below is part of a larger chunk that is calculating Fst values across loci and alleles. This chunk is designed to calculate the proportion ('p.a') of an allele ('a') at a locus in each population ('p') and the proportion of individuals heterozygous for that allele ('h.a'). I'm not concerned with being slick in terms of using the most convenient functions, but would rather have it do the calculations as fast as possible as this bit is getting run very frequently and seems to be taking the most compute time. Profiling seems to indicate that functions like 'length(which(...))' and 'table(factor(...,level=a))' are more expensive than the logical vector creation scheme below. I just want to make sure I haven't overlooked any other viable options that might be available. Any and all suggestions are gladly welcomed. Thanks in advance.

Cheers,

eric

--- Variables used : 'pop' - population i.d. , 'a1' & 'a2' - alleles 1 and 2 at locus : all character vectors of equal length (no NAs) nvec - vector of number of individuals in population 'p' a - allele for which 'p.a' and 'p.het.a' are being calculated Here's some example data and then the code snippet in question: test <- structure(list(pop = c("1", "1", "1", "1", "1", "1", "1", "1",Received on Thu May 04 04:34:48 2006

"2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"

), loc4.a1 = c("3", "3", "4", "3", "3", "4", "4", "4", "3", "4",

"4", "3", "4", "2", "4", "4", "4", "4", "2", "4", "2"), loc4.a2 = c("3","3", "3", "3", "4", "3", "3", "3", "2", "3", "3", "3", "4", "2",

"3", "4", "3", "4", "1", "3", "1")), .Names = c("pop", "loc4.a1",

"loc4.a2"), na.action = structure(36, .Names = "36", class = "omit"),

row.names = c("1",

"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",

"14", "15", "16", "17", "18", "19", "20", "21"), class = "data.frame")

pop <- test$pop a1 <- test$loc4.a1 a2 <- test$loc4.a2 nvec <- table(pop) a <- "3" with.a <- a1 == a | a2 == a allele.stats <- sapply(names(nvec), function(p) { this.pop <- pop == p & with.a a.in.a1 <- a1[this.pop] == a a.in.a2 <- a2[this.pop] == a na1 <- length(a.in.a1[a.in.a1]) na2 <- length(a.in.a2[a.in.a2]) p.a <- (na1 + na2) / nvec[p] / 2 is.het <- a1[this.pop] != a2[this.pop] p.het.a <- length(is.het[is.het]) / nvec[p] c(p.a, p.het.a) }) -- Eric Archer, Ph.D. NOAA-SWFSC 8604 La Jolla Shores Dr. La Jolla, CA 92037 858-546-7121,7003(FAX) eric.archer@noaa.gov

"Lighthouses are more helpful than churches."

- Benjamin Franklin

"Cogita tute" - Think for yourself

______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Thu 04 May 2006 - 06:10:05 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*