From: Seth Falcon <seth_at_userprimary.net>

Date: Fri, 01 Jan 2010 16:56:02 -0800

*>
*

> I'm not complaining that it is not documented.

*>
*

*>
*

> Could you explain what ns and nx represent?

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 02 Jan 2010 - 00:58:51 GMT

Date: Fri, 01 Jan 2010 16:56:02 -0800

On 1/1/10 1:40 PM, Peng Yu wrote:

> On Fri, Jan 1, 2010 at 6:52 AM, Barry Rowlingson

*> <b.rowlingson_at_lancaster.ac.uk> wrote:
*

>> On Thu, Dec 31, 2009 at 11:27 PM, Peng Yu <pengyu.ut_at_gmail.com> wrote: >>> I don't see where describes the implementation of '[]'. >>> >>> For example, if x is a matrix or a data.frame, how the lookup of >>> 'colname1' is x[, 'colname1'] executed. Does R perform a lookup in the >>> a hash of the colnames? Is the reference O(1) or O(n), where n is the >>> second dim of x? >> >> Where have you looked? I doubt this kind of implementation detail is >> in the .Rd documentation since a regular user doesn't care for it.

> I'm not complaining that it is not documented.

>> As Obi-wan Kenobi may have said in Star Wars: "Use the source, Luke!": >> >> Line 450 of subscript.c of the source code of R 2.10 is the >> stringSubscript function. It has this comment: >> >> /* The original code (pre 2.0.0) used a ns x nx loop that was too >> * slow. So now we hash. Hashing is expensive on memory (up to 32nx >> * bytes) so it is only worth doing if ns * nx is large. If nx is >> * large, then it will be too slow unless ns is very small. >> */

> Could you explain what ns and nx represent?

integers :-)

Consider a 5x5 matrix m and a call like m[ , c("C", "D")], then in the call to stringSubscript:

s - The character vector of subscripts, here c("C", "D")

ns - length of s, here 2

nx - length of the dimension being subscripted, here 5

names - the dimnames being subscripted. Here, perhaps

>> The definition of "large" and "small" here appears to be such that: >> >> 457: Rboolean usehashing = in && ( ((ns > 1000 && nx) || (nx > 1000 && >> ns)) || (ns * nx > 15*nx + ns) );

The 'in' argument is always TRUE AFAICS so this boils down to:

Use hashing for x[i] if either length(x) > 1000 or length(i) > 1000 (and we aren't in the trivial case where either length(x) == 0 or length(i) == 0)

OR use hashing if (ns * nx > 15*nx + ns)

+ seth

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat 02 Jan 2010 - 00:58:51 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 02 Jan 2010 - 10:50:10 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*