Re: [R] Question on memory allocation & loop

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu 29 Jun 2006 - 17:21:03 EST

On Thu, 29 Jun 2006, Manoj wrote:

> Hello All,
> I am trying to work on writing the following piece of (pseudo)
> code in an optimal fashion:
>
> ----------------------------------------------------
> # Two data frames with some data
>
> a = data.frame(somedata)
> b = data.frame(somedata)
>
> for(i in 1:nrow(dt) {
> # Merge dates for a given date into a new data frame
> c = merge(a[a$dt==dt[i],),b[b$dt == dt[i],], by=c(some column));
> }

Note that only the last iteration of that loop is actually needed.

What are you really trying to do, and why are you worrying about memory? E.g. merge() in R-devel is a lot more efficient for some operations, including perhaps your example.

> ----------------------------------------------------
>
>
> Now, my understanding is that the data frame c in the above code is
> malloc'ed in every count of the loop. Is that assumption correct?

No. Here 'c' is just a symbol, and assignment (please use <- in public code, it is easier to read) binds the symbol to the data frame returned by merge(). So the allocation (not 'malloc' necessarily) is going on inside merge(). Also, 'c' is a system object, so you are confusing people by using its name for your own object.

When you assign to 'c' you change the binding to a different already allocated object. Eventually garbage collection will recover (to R) the memory allocated to objects which are no longer bound to symbols.

I am not aware of any account which describes in detail how R works at this level, and end users do not need to know it. (It is also the case that R maintains a number of illusions and internally may not do what it appears to do.)

>
> Is the following attempt a better way of doing things?
>
> ----------------------------------------------------
> a = data.frame(somedata)
> b = data.frame(somedata)
>
> # Pre-allocate data frame c
>
> c = data.frame(for some size);
>
> for(i in 1:nrow(dt) {
> # Merge dates for a given date into a new data frame
> # and copy the result into c
>
> copy(c, merge(a[a$dt==dt[i],),b[b$dt == dt[i],], by=c(some column));
>
> }
> ----------------------------------------------------
>
> Now the question is, How can I copy the merged data into my
> pre-allocated data frame c ? I tried rbind/cbind but they are pretty
> fuzzy about having the right names and dimension hence it fails.
>
> Any help would be greatly appreciated!
>
> Thanks.
>
> Manoj
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Thu Jun 29 17:29:35 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 29 Jun 2006 - 18:12:11 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.