*> user system elapsed
> 29.21 0.30 29.58

*> user system elapsed
> 1.87 0.45 2.37

**> [1] TRUE
*> user system elapsed
> 5.98 0.05 6.05

*> user system elapsed
> 0.21 0.03 0.25

**> [1] TRUE
*> user system elapsed
> 5.88 0.02 5.98

*> user system elapsed
> 0.20 0.05 0.25

**> [1] TRUE
*> user system elapsed
> 5.82 0.03 5.89

*> user system elapsed
> 0.22 0.04 0.25

*> [1] TRUE
reference to SO :

*> http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels
On Apr 11, 2011, at 23:53 , Joris Meys wrote:

> Based on a discussion on SO I ran some tests and found that converting

*> to a factor is best done early in the process. Hence, I propose to
**> rewrite the gl() function as :
**>
**> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
**> rep(
**> rep(
**> factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
**> ),length.out=length
**> )
**> }
**>
That's bizarre! You are relying on an optimization in rep.factor whereby it replicates the internal codes and exploits that the result has the same structure as the input. I.e., it just tacks on class and levels attributes rather than call match() as factor() does internally.

However, you can do the same thing straight away:

*> gl2
*

function (n, k, length = n * k, labels = 1:n, ordered = FALSE)
{

y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length) structure(y, levels=as.character(labels), class=c(if(ordered)"ordered","factor")) }

I get this to be a bit faster than your version, although with a smaller speedup factor, which probably just indicates that match() is faster on this machine.

>> system.time(X1 <- gl(5,1e7))

> 29.21 0.30 29.58

>> system.time(X2 <- gl2(5,1e7))

> 1.87 0.45 2.37

>> all.equal(X1,X2)

>> system.time(X1 <- gl(5,100,1e7))

> 5.98 0.05 6.05

>> system.time(X2 <- gl2(5,100,1e7))

> 0.21 0.03 0.25

>> all.equal(X1,X2)

>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))

> 5.88 0.02 5.98

>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))

> 0.20 0.05 0.25

>> all.equal(X1,X2)

>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))

> 5.82 0.03 5.89

>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))

> 0.22 0.04 0.25

>> all.equal(X1,X2)

> reference to SO :

-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes_at_cbs.dk Priv: PDalgd_at_gmail.com ______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-develReceived on Tue 12 Apr 2011 - 07:06:30 GMT

