Re: [Rd] [datatable-help] speeding up perception

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Tue, 05 Jul 2011 21:01:52 -0400

On Jul 5, 2011, at 7:18 PM, <luke-tierney_at_uiowa.edu> <luke-tierney_at_uiowa.edu  > wrote:

> On Tue, 5 Jul 2011, Simon Urbanek wrote:
>
>>
>> On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:
>>
>>> Simon (and all),
>>>
>>> I've tried to make assignment as fast as calling `[<-.data.table`
>>> directly, for user convenience. Profiling shows (IIUC) that it isn't
>>> dispatch, but x being copied. Is there a way to prevent '[<-' from
>>> copying x?
>>
>> Good point, and conceptually, no. It's a subassignment after all -
>> see R-lang 3.4.4 - it is equivalent to
>>
>> `*tmp*` <- x
>> x <- `[<-`(`*tmp*`, i, j, value)
>> rm(`*tmp*`)
>>
>> so there is always a copy involved.
>>
>> Now, a conceptual copy doesn't mean real copy in R since R tries to
>> keep the pass-by-value illusion while passing references in cases
>> where it knows that modifications cannot occur and/or they are
>> safe. The default subassign method uses that feature which means it
>> can afford to not duplicate if there is only one reference -- then
>> it's safe to not duplicate as we are replacing that only existing
>> reference. And in the case of a matrix, that will be true at the
>> latest from the second subassignment on.
>>
>> Unfortunately the method dispatch (AFAICS) introduces one more
>> reference in the dispatch chain so there will always be two
>> references so duplication is necessary. Since we have only 0 / 1 /
>> 2+ information on the references, we can't distinguish whether the
>> second reference is due to the dispatch or due to the passed object
>> having more than one reference, so we have to duplicate in any
>> case. That is unfortunate, and I don't see a way around (unless we
>> handle subassignment methods is some special way).
>
> I don't believe dispatch is bumping NAMED (and a quick experiment

> seems to confirm this though I don't guarantee I did that right). The
> issue is that a replacement function implemented as a closure, which
> is the only option for a package, will always see NAMED on the object
> to be modified as 2 (because the value is obtained by forcing the
> argument promise) and so any R level assignments will duplicate. This
> also isn't really an issue of imprecise reference counting -- there
> really are (at least) two legitimate references -- one though the
> argument and one through the caller's environment.
>
> It would be good it we could come up with a way for packages to be
> able to define replacement functions that do not duplicate in cases
> where we really don't want them to, but this would require coming up
> with some sort of protocol, minimally involving an efficient way to
> detect whether a replacement funciton is being called in a replacement

> context or directly.

Would "$<-" always satisfy that condition. It would be big help to me if it could be designed to avoid duplication the rest of the data.frame.

-- 


>
> There are some replacement functions that use C code to cheat, but
> these may create problems if called directly, so I won't advertise
> them.
>
> Best,
>

> luke
>
>>
>> Cheers,
>> Simon
>>
>>
>>
>
> --
> Luke Tierney

> Statistics and Actuarial Science
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa Phone: 319-335-3386
> Department of Statistics and Fax: 319-335-3017
> Actuarial Science
> 241 Schaeffer Hall email: luke_at_stat.uiowa.edu
> Iowa City, IA 52242 WWW: http://
> www.stat.uiowa.edu______________________________________________
> R-devel_at_r-project.org mailing list

> https://stat.ethz.ch/mailman/listinfo/r-devel
David Winsemius, MD West Hartford, CT ______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 06 Jul 2011 - 01:03:19 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Jul 2011 - 01:40:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive