Re: [R] computational speed question

From: jim holtman <jholtman_at_gmail.com>
Date: Fri 07 Jul 2006 - 12:59:11 EST

Do you have a dataframe or a matrix? For what you are doing, this should be a matrix. I modified the code to use a matrix.

Here is a test I ran and it takes about 1 second to do it.

> n <- 250*20000
> temp <- matrix(runif(n), nrow=250) # create some test data
> # add some names to columns
> colnames(temp) <- sample(LETTERS, 20000, T)
>
> system.time({

+     rowaverage<-function(x) rowMeans(temp[,x],na.rm=TRUE )
+     averages<-tapply(seq(ncol(temp)),colnames(temp),rowaverage)
+ })

[1] 0.36 0.06 0.42 NA NA
>
>
>
>
>
> averages[1]

$A
  [1] 0.5112209 0.4897363 0.5185241 0.5024847 0.5175194 0.5015971 0.5056319 0.5033174 0.4829253 0.4988081
 [11] 0.4899275 0.4944568 0.5093905 0.5151457 0.4992038 0.4999269 0.5064075 0.5046926 0.4847728 0.5182708
 [21] 0.4856212 0.5008978 0.5092587 0.5024103 0.4924742 0.4980359 0.5008564 0.5133731 0.5060888 0.5029579
 [31] 0.4967817 0.4896169 0.4878164 0.5085761 0.5018058 0.5087363 0.5146902 0.5015729 0.4897018 0.5016664
 [41] 0.4831489 0.4799853 0.5042539 0.4982528 0.4943814 0.4974861 0.5017000 0.4962790 0.4946369 0.4972477
 [51] 0.5099817 0.4908761 0.4882964 0.5120057 0.5107774 0.5072615 0.5015804 0.4869726 0.5030821 0.5115180
 [61] 0.5015209 0.5008781 0.4980628 0.5098239 0.4997307 0.5073817 0.5025378 0.5001009 0.5113822 0.5120676
 [71] 0.5110741 0.4930692 0.4957875 0.4874903 0.5139654 0.4894350 0.4784051 0.4983018 0.5012512 0.4966652
 [81] 0.5096759 0.5113182 0.5024274 0.5034941 0.5110116 0.4987659 0.5071096 0.5309148 0.5010166 0.4917245
 [91] 0.5101545 0.4923370 0.5030376 0.5101287 0.4865259 0.5037619 0.4967567 0.4872504 0.5055892 0.5068486
[101] 0.5054698 0.4968007 0.4881163 0.4892564 0.4957340 0.4990345 0.4890099 0.4917706 0.5055200 0.4983850
[111] 0.5021503 0.5115502 0.4887234 0.5048929 0.5007715 0.4898906 0.4968263 0.4872401 0.5005940 0.4964652
[121] 0.4952756 0.4981603 0.4906134 0.4990960 0.5107533 0.4874836 0.4980351 0.4912040 0.4986199 0.4877104
[131] 0.4930825 0.4872884 0.4956200 0.5103397 0.4970019 0.5088349 0.4807062 0.4858911 0.5037099 0.5008526
[141] 0.4886732 0.5060938 0.5102449 0.5082154 0.5044781 0.4922414 0.4864518 0.4977510 0.5036223 0.4991290
[151] 0.4996673 0.4932963 0.4919180 0.4716422 0.4976030 0.4960977 0.4912395 0.4986145 0.5117688 0.5034218
[161] 0.5241384 0.5039768 0.4976212 0.4803117 0.5128103 0.4874540 0.5082491 0.5104243 0.5065004 0.4972450
[171] 0.4970007 0.5003468 0.5117209 0.5164802 0.5229826 0.4907171 0.5052669 0.4856187 0.4903460 0.4974213
[181] 0.5054415 0.5047443 0.4996494 0.4979700 0.5045505 0.4972314 0.5109166 0.4853377 0.5009606 0.5148585
[191] 0.4997406 0.4717888 0.4991179 0.5007500 0.4986203 0.4923562 0.5117240 0.4919311 0.4865237 0.5069973
[201] 0.5006723 0.4970111 0.5170562 0.5083913 0.5016317 0.5040758 0.5055306 0.5108655 0.5072274 0.4812014
[211] 0.4934747 0.4918549 0.4777216 0.4991372 0.4997925 0.4956298 0.5077896 0.4902372 0.4954648 0.4947687
[221] 0.5112607 0.4980376 0.5189054 0.5038222 0.5114223 0.5083394 0.5004338 0.5107744 0.4966209 0.4885567
[231] 0.4986649 0.4883257 0.4893635 0.5118492 0.4946722 0.4981683 0.5152559 0.5047399 0.4907663 0.4941242
[241] 0.5134061 0.5006573 0.4968652 0.5141095 0.5113004 0.4956569 0.5080689 0.4975634 0.4951120 0.5058744

>

On 7/6/06, markleeds@verizon.net <markleeds@verizon.net> wrote:
>
> I have a 250 row by 20,000 column dataframe called temp and
> I do
>
> rowaverage<-function(x) rowmeans(temp[x],na.rm=TRUE )
> averages<-tapply(seq(temp),names(temp),rowaverage)
> averages<-do.call('cbind',averages)
>
> , is it okay that it's been running for 4 hours or
> does this mean that something went wrong. I am on windows
> XP and i did ctrl alt delete and it seems like the process
> is running as far as i can tell. I have 4 cpus
> and the one is getting used at its max and when
> i do ctrl alt dlete task manager and click on processes
> it says 699,412K under the mem usage column
> but that number hasn't changed in a looong time.
>
> when i click on performance, cpu usage says 25 % which make sense.
> and pf usage sys 970 MB. physical memory total is 8387312,
> available 6868916 and system cache is 3007044 but these numbers
> move around slightly.
>
> kernel memory total is 108072
> paged 81344
> nonpaged 26744
>
> as far as hardware, i'm pretty clueless. all
> i know is that i have 4 cpus ( actually 2 cpus but
> somehow each cpu is 2 cpus whatever that means. but only one of the 4 gets
> used st anyone time unless i run multiple instances of R ), and 9 gig of
> RAM. I don't know what kind of a chip i have but i know the computer is from
> Dell.
>
> it's okay if it takes this long but i was just wondering
> if there is a way to check if things have stopped or
> somehow frozen ?
>
> when i do a tail on the .Rout file ( I am running the program
> using R CMD BATCH ), it's just sitting at the same spot
> where this computation would be done so I can't tell much from that.
> Thanks.
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jul 07 13:03:56 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 07 Jul 2006 - 18:17:28 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.