Re: [Rd] [datatable-help] speeding up perception

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Mon, 11 Jul 2011 21:23:45 -0400

Matthew,

I was hoping I misunderstood you first proposal, but I suspect I did not ;).

Personally, I find DT[1,V1 <- 3] highly disturbing - I would expect it to evaluate to { V1 <- 3; DT[1, V1] }
thus returning the first element of the third column.

I do understand that within(foo, expr, ...) was the motivation for passing expressions, but unlike within() the subsetting operator [ is not expected to take expression as its second argument. Such abuse is quite unexpected and I would say dangerous.

That said, I don't think it works, either. Taking you example and data.table form r-forge:

> m = matrix(1,nrow=100000,ncol=100)
> DF = as.data.frame(m)
> DT = as.data.table(m)
> for (i in 1:1000) DT[1,V1 <- 3]
> DT[1,]

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 [1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

as you can see, DT is not modified.

Also I suspect there is something quite amiss because even trivial things don't work:

> DF[1:4,1:4]
  V1 V2 V3 V4
1 3 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
> DT[1:4,1:4]
[1] 1 2 3 4

When I first saw your proposal, I thought you have rather something like within(DT, V1[1] <- 3)
in mind which looks innocent enough but performs terribly (note that I had to scale down the loop by a factor of 100!!!):

> system.time(for (i in 1:10) within(DT, V1[1] <- 3))