[R] loop over large dataset

From: Federico Calboli <f.calboli_at_imperial.ac.uk>
Date: Mon 04 Jul 2005 - 20:23:12 EST

In my absentmindedness I'd forgotten to CC this to the list... and BTW, using gc() in the loop increases the runtime...

>> My suggestion is that you try to vectorize the computation as much
>> as you
>> can.
>>
>> From what you've shown, `new' and `ped' need to have the same
>> number of
>> rows, right?
>>
>> Your `off' function seems to be randomly choosing between columns
>> 1 and 2
>> from its two input matrices (one row each?). You may want to do the
>> sampling all at once instead of looping over the rows. E.g.,
>>
>>
>>
>>> (m <- matrix(1:10, ncol=2))
>>>
>>>
>> [,1] [,2]
>> [1,] 1 6
>> [2,] 2 7
>> [3,] 3 8
>> [4,] 4 9
>> [5,] 5 10
>>
>>
>>> (colSample <- sample(1:2, nrow(m), replace=TRUE))
>>>
>>>
>> [1] 1 1 2 1 1
>>
>>
>>> (x <- m[cbind(1:nrow(m), colSample)])
>>>
>>>
>> [1] 1 2 8 4 5
>>
>> So you might want to do something like (obviously untested):
>>
>> todo <- ped[,3] * ped[,5] != 0 ## indicator of which rows to work on
>> n.todo <- sum(todo) ## how many are there?
>> sire <- new[ped[todo, 3], ]
>> dam <- new[ped[todo, 5], ]
>> s.gam <- sire[1:nrow(sire), sample(1:2, nrow(sire), replace=TRUE)]
>> d.gam <- dam[1:nrow(dam), sample(1:2, nrow(dam), replace=TRUE)]
>> new[todo, 1:2] <- cbind(s.gam, d.gam)
>>
>>
>
> Improving the efficiency of the code is abviously a plus, but the
> real thing I am mesmerised by is the sheer increase in runtime...
> how come not a linear increase with dataset size?
>
> Cheers,
>
> Federico
>
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St. Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel +44 (0)20 75941602 Fax +44 (0)20 75943193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 04 20:28:12 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST