# Re: [R] saving a character vector

From: Philippe Grosjean <phgrosjean_at_sciviews.org>
Date: Sun 05 Feb 2006 - 19:22:55 EST

If I understand the question correctly, both Jim Holtman's and John Fox's answers are correct solutions. However, they are not optimal ones (that was not the question -optimize my code, please-, but one can talk about it a little bit).

• Jim proposes (I rework a little bit his code): generateIndex1 <- function(n.item) { Res <- character(0) # initialize vector for (i in 1:(n.item - 1)) { # John Fox's correction introduced for (j in ((i+1):n.item)) { # concatenate the results Res <- c(Res, paste("i", formatC(i, digits = 2, flag = "0"), ".", formatC(j, digits = 2, flag = "0"), sep = "")) } } Res }
• John Fox proposes: generateIndex2 <- function(n.item) { result <- rep("", n.item * (n.item - 1) / 2) index <- 0 for (i in 1:(n.item - 1)) { for (j in ((i + 1):n.item)) { index <- index + 1 result[index] <- paste("i", formatC(i, digits = 2, flag = "0"), ".", formatC(j, digits = 2, flag = "0"), sep = "") } } result }

The difference is that Jim creates an empty character vector and concatenate to it (simplest code), and John creates a vector of empty characters of the correct size [result <- rep("", n.item * (n.item - 1) / 2)]. The second solution is supposed to be better, because "result" is supposed to be of the right size, limiting useless memory pagination inside each loop iteration. However:

> system.time(generateIndex1(100))
 4.86 0.00 4.86 NA NA
> system.time(generateIndex2(100))
 4.68 0.00 4.68 NA NA

There is not much difference (well, indeed, the loops and what's calculated repreatedly inside takes much more time in this case). However, I wonder what happens if I allocate a vector of the right size with strings having also the right size:

generateIndex3 <- function(n.item) {

```     result <- rep("i000.000", n.item * (n.item - 1) / 2)
index <- 0
for (i in 1:(n.item - 1)) {
for (j in ((i + 1):n.item)) {
index <- index + 1
result[index] <- paste("i",
formatC(i, digits = 2, flag = "0"), ".",
formatC(j, digits = 2, flag = "0"), sep = "")
}
}
result
```

}

> system.time(generateIndex3(100))
 4.63 0.02 4.66 NA NA

Now, where is the bottleneck?

``` > Rprof()
> res <- generateIndex3(100)
> Rprof(NULL)
> ?summaryRprof
> summaryRprof()
\$by.self
self.time self.pct total.time total.pct
formatC                 0.48     10.5       4.30      93.9
paste                   0.46     10.0       4.54      99.1
pmax                    0.44      9.6       0.66      14.4
as.integer              0.30      6.6       0.34       7.4
as.logical              0.24      5.2       0.34       7.4
names                   0.20      4.4       0.24       5.2
```
...

Gosh! For sure: Why do I call FormatC() every time twice in the loop? I can increase speed by formatting my character strings only once!

generateIndex4 <- function(n.item) {

```     result <- rep("i000.000", n.item * (n.item - 1) / 2)
index <- 0
id <- formatC(1:n.item, digits = 2, flag = "0")
for (i in 1:(n.item - 1)) {
for (j in ((i + 1):n.item)) {
index <- index + 1
result[index] <- paste("i", id[i], ".", id[j], sep = "")
}
}
result
```

}

> system.time(generateIndex4(100))
 0.33 0.00 0.33 NA NA

Yes! That's much better.
Now, recall that it is better to use a vectorized algorithm than loops, could I get rid of these two ugly loops? Here is something using outer() and lower.tri():

generateIndex5 <- function(n.item) {

```     idx <- function(x, y) paste("i", x, ".", y, sep = "")
id <- formatC(1:n.item, digits = 2, flag = "0")
allidx <- t(outer(id, id, idx))
allidx[lower.tri(allidx)]
```

}

> system.time(generateIndex5(100))
 0.02 0.00 0.02 NA NA

Indeed! That code is much, much faster!
Now, let's compare generateIndex1() with generateIndex5().

• generateIndex5() is optimized for speed (4.86/0.02, about 250 times faster!)
• generateIndex5() is more concise code: 4 lines, no loops, compared to 8 lines with two loops.
• but... generateIndex1() is the code that comes to mind more easily (except, perhaps for some R experts (?) because thinking with vectors is their second nature).
• but... generateIndex1() is much easier to understand, when someone else read the code (for the same reason).

Final conclusion:
generateIndex5() is a better R code (I am sure one can do even better!), but it is a little bit more intellectual work to arrive to this result (i.e., rethink the problem using matrix calculation). However, the result is worth the effort.

(note: this will be introduced in the future R Wiki. This is the reson why this email is so long: I took a good occasion to speak about code optimization).

Best,

Philippe Grosjean

```..............................................<°}))><........
```
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems   ) ) ) ) ) Mons-Hainaut University, Pentagone (3D08) ( ( ( ( (
```..............................................................

```

jim holtman wrote:

```> Is this what you want?  It returns a character vector with the values:
>
>
>>generate.index<-function(n.item){
>
> + .return <- character()  # initialize vector
> + for (i in 1:n.item)
> +    {
> +        for (j in ((i+1):n.item))
> +            {
> + # concatenate the results
> + .return <- c(.return,
> paste("i",formatC(i,digits=2,flag="0"),".",formatC(j,digits=2,flag="0"),sep=""))
> +
> +            }
> +
> +    }
> +    .return
> +  }
>
>>
>>generate.index(10)
>
>   "i001.002" "i001.003" "i001.004" "i001.005" "i001.006" "i001.007"
>   "i001.008" "i001.009" "i001.010" "i002.003" "i002.004" "i002.005"
>  "i002.006" "i002.007" "i002.008" "i002.009" "i002.010" "i003.004"
>  "i003.005" "i003.006" "i003.007" "i003.008" "i003.009" "i003.010"
>  "i004.005" "i004.006" "i004.007" "i004.008" "i004.009" "i004.010"
>  "i005.006" "i005.007" "i005.008" "i005.009" "i005.010" "i006.007"
>  "i006.008" "i006.009" "i006.010" "i007.008" "i007.009" "i007.010"
>  "i008.009" "i008.010" "i009.010" "i010.011" "i010.010"
>
>
>
>
> On 2/4/06, Taka Matzmoto <sell_mirage_ne@hotmail.com> wrote:
>
>>Hi R users
>>
>>I wrote a function that generates some character strings.
>>
>>generate.index<-function(n.item){
>>for (i in 1:n.item)
>>   {
>>       for (j in ((i+1):n.item))
>>           {
>>
>>
>>cat("i",formatC(i,digits=2,flag="0"),".",formatC(j,digits=2,flag="0"),"\n",sep="")
>>
>>           }
>>
>>   }
>>                               }
>>
>>I like to save what appears on the screen when I run using
>>generate.index(10) as a character vector
>>
>>I used
>>temp <- generate.index(10)
>>
>>but it didn't work.
>>
>>Could you provide some advice on this issue?
>>
>>
>>TM
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>http://www.R-project.org/posting-guide.html
>>
>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 247 0281
>
> What the problem you are trying to solve?
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help