Re: [R] saving a character vector

From: Philippe Grosjean <phgrosjean_at_sciviews.org>
Date: Sun 05 Feb 2006 - 19:22:55 EST

If I understand the question correctly, both Jim Holtman's and John Fox's answers are correct solutions. However, they are not optimal ones (that was not the question -optimize my code, please-, but one can talk about it a little bit).

The difference is that Jim creates an empty character vector and concatenate to it (simplest code), and John creates a vector of empty characters of the correct size [result <- rep("", n.item * (n.item - 1) / 2)]. The second solution is supposed to be better, because "result" is supposed to be of the right size, limiting useless memory pagination inside each loop iteration. However:

 > system.time(generateIndex1(100))
[1] 4.86 0.00 4.86 NA NA
 > system.time(generateIndex2(100))
[1] 4.68 0.00 4.68 NA NA

There is not much difference (well, indeed, the loops and what's calculated repreatedly inside takes much more time in this case). However, I wonder what happens if I allocate a vector of the right size with strings having also the right size:

generateIndex3 <- function(n.item) {

     result <- rep("i000.000", n.item * (n.item - 1) / 2)
     index <- 0
     for (i in 1:(n.item - 1)) {
         for (j in ((i + 1):n.item)) {
             index <- index + 1
             result[index] <- paste("i",
                 formatC(i, digits = 2, flag = "0"), ".",
                 formatC(j, digits = 2, flag = "0"), sep = "")
         }
     }
     result

}

 > system.time(generateIndex3(100))
[1] 4.63 0.02 4.66 NA NA

... About the same. **Could someone explain me here, please?**

Now, where is the bottleneck?

 > Rprof()
 > res <- generateIndex3(100)
 > Rprof(NULL)
 > ?summaryRprof
 > summaryRprof()
$by.self
                    self.time self.pct total.time total.pct
formatC                 0.48     10.5       4.30      93.9
paste                   0.46     10.0       4.54      99.1
pmax                    0.44      9.6       0.66      14.4
as.integer              0.30      6.6       0.34       7.4
as.logical              0.24      5.2       0.34       7.4
names                   0.20      4.4       0.24       5.2
...

Gosh! For sure: Why do I call FormatC() every time twice in the loop? I can increase speed by formatting my character strings only once!

generateIndex4 <- function(n.item) {

     result <- rep("i000.000", n.item * (n.item - 1) / 2)
     index <- 0
     id <- formatC(1:n.item, digits = 2, flag = "0")
     for (i in 1:(n.item - 1)) {
         for (j in ((i + 1):n.item)) {
             index <- index + 1
             result[index] <- paste("i", id[i], ".", id[j], sep = "")
         }
     }
     result

}

 > system.time(generateIndex4(100))
[1] 0.33 0.00 0.33 NA NA

Yes! That's much better.
Now, recall that it is better to use a vectorized algorithm than loops, could I get rid of these two ugly loops? Here is something using outer() and lower.tri():

generateIndex5 <- function(n.item) {

     idx <- function(x, y) paste("i", x, ".", y, sep = "")
     id <- formatC(1:n.item, digits = 2, flag = "0")
     allidx <- t(outer(id, id, idx))
     allidx[lower.tri(allidx)]

}

 > system.time(generateIndex5(100))
[1] 0.02 0.00 0.02 NA NA

Indeed! That code is much, much faster!
Now, let's compare generateIndex1() with generateIndex5().

Final conclusion:
generateIndex5() is a better R code (I am sure one can do even better!), but it is a little bit more intellectual work to arrive to this result (i.e., rethink the problem using matrix calculation). However, the result is worth the effort.

(note: this will be introduced in the future R Wiki. This is the reson why this email is so long: I took a good occasion to speak about code optimization).

Best,

Philippe Grosjean

..............................................<}))><........
  ) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
  ) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems   ) ) ) ) ) Mons-Hainaut University, Pentagone (3D08) ( ( ( ( (
..............................................................

jim holtman wrote:

> Is this what you want?  It returns a character vector with the values:
> 
> 
>>generate.index<-function(n.item){
> 
> + .return <- character()  # initialize vector
> + for (i in 1:n.item)
> +    {
> +        for (j in ((i+1):n.item))
> +            {
> + # concatenate the results
> + .return <- c(.return,
> paste("i",formatC(i,digits=2,flag="0"),".",formatC(j,digits=2,flag="0"),sep=""))
> +
> +            }
> +
> +    }
> +    .return
> +  }
> 
>>
>>generate.index(10)
> 
>  [1] "i001.002" "i001.003" "i001.004" "i001.005" "i001.006" "i001.007"
>  [7] "i001.008" "i001.009" "i001.010" "i002.003" "i002.004" "i002.005"
> [13] "i002.006" "i002.007" "i002.008" "i002.009" "i002.010" "i003.004"
> [19] "i003.005" "i003.006" "i003.007" "i003.008" "i003.009" "i003.010"
> [25] "i004.005" "i004.006" "i004.007" "i004.008" "i004.009" "i004.010"
> [31] "i005.006" "i005.007" "i005.008" "i005.009" "i005.010" "i006.007"
> [37] "i006.008" "i006.009" "i006.010" "i007.008" "i007.009" "i007.010"
> [43] "i008.009" "i008.010" "i009.010" "i010.011" "i010.010"
> 
> 
> 
> 
> On 2/4/06, Taka Matzmoto <sell_mirage_ne@hotmail.com> wrote:
> 
>>Hi R users
>>
>>I wrote a function that generates some character strings.
>>
>>generate.index<-function(n.item){
>>for (i in 1:n.item)
>>   {
>>       for (j in ((i+1):n.item))
>>           {
>>
>>
>>cat("i",formatC(i,digits=2,flag="0"),".",formatC(j,digits=2,flag="0"),"\n",sep="")
>>
>>           }
>>
>>   }
>>                               }
>>
>>I like to save what appears on the screen when I run using
>>generate.index(10) as a character vector
>>
>>I used
>>temp <- generate.index(10)
>>
>>but it didn't work.
>>
>>Could you provide some advice on this issue?
>>
>>Thanks in advance
>>
>>TM
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>>
> 
> 
> 
> 
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 247 0281
> 
> What the problem you are trying to solve?
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Feb 05 19:32:12 2006

This archive was generated by hypermail 2.1.8 : Mon 06 Feb 2006 - 03:48:58 EST