Re: [R] loop over large dataset

From: Uwe Ligges <ligges_at_statistik.uni-dortmund.de>
Date: Mon 04 Jul 2005 - 21:41:04 EST

Federico Calboli wrote:

> In my absentmindedness I'd forgotten to CC this to the list... and > BTW, using gc() in the loop increases the runtime...

If the data size increases, you cannot expect linear run time behaviour, e.g. because gc() is called more frequently. And of course, gc() needs some time, hence you get the expected increase in runtime. This answers you other question as well.

Uwe Ligges

>>>My suggestion is that you try to vectorize the computation as much
>>>as you
>>>can.
>>>
>>>From what you've shown, `new' and `ped' need to have the same
>>>number of
>>>rows, right?
>>>
>>>Your `off' function seems to be randomly choosing between columns
>>>1 and 2
>>>from its two input matrices (one row each?). You may want to do the
>>>sampling all at once instead of looping over the rows. E.g.,
>>>
>>>
>>>
>>>
>>>>(m <- matrix(1:10, ncol=2))
>>>>
>>>>
>>>
>>> [,1] [,2]
>>>[1,] 1 6
>>>[2,] 2 7
>>>[3,] 3 8
>>>[4,] 4 9
>>>[5,] 5 10
>>>
>>>
>>>
>>>>(colSample <- sample(1:2, nrow(m), replace=TRUE))
>>>>
>>>>
>>>
>>>[1] 1 1 2 1 1
>>>
>>>
>>>
>>>>(x <- m[cbind(1:nrow(m), colSample)])
>>>>
>>>>
>>>
>>>[1] 1 2 8 4 5
>>>
>>>So you might want to do something like (obviously untested):
>>>
>>>todo <- ped[,3] * ped[,5] != 0 ## indicator of which rows to work on
>>>n.todo <- sum(todo) ## how many are there?
>>>sire <- new[ped[todo, 3], ]
>>>dam <- new[ped[todo, 5], ]
>>>s.gam <- sire[1:nrow(sire), sample(1:2, nrow(sire), replace=TRUE)]
>>>d.gam <- dam[1:nrow(dam), sample(1:2, nrow(dam), replace=TRUE)]
>>>new[todo, 1:2] <- cbind(s.gam, d.gam)
>>>
>>>
>>
>>Improving the efficiency of the code is abviously a plus, but the
>>real thing I am mesmerised by is the sheer increase in runtime...
>>how come not a linear increase with dataset size?
>>
>>Cheers,
>>
>>Federico
>>
>>--
>>Federico C. F. Calboli
>>Department of Epidemiology and Public Health
>>Imperial College, St. Mary's Campus
>>Norfolk Place, London W2 1PG
>>
>>Tel +44 (0)20 75941602 Fax +44 (0)20 75943193
>>
>>f.calboli [.a.t] imperial.ac.uk
>>f.calboli [.a.t] gmail.com
>>
>>

> 
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 04 21:45:23 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST