Re: [R] How to speed up nested for loop computations

From: jim holtman <jholtman_at_gmail.com>
Date: Fri 11 Aug 2006 - 20:40:50 EST

In your split, you used the dataframe. What you want to do is to split the row numbers to give the list of indices. Once you have dont this, you can 'lapply' this list of indices to a function. The one that I have below will return the index of the minimum of 'best' in each partition. You can then use this list to do further computations.

res.split <- split(1:nrow(res), list(res$instance, res$try, res$idalgo),drop=TRUE)
min.list <- lapply(res.split, function(x){

    # 'x' is the list of indices. Find the one that is the minimum of 'best'

    # notice that you have to subset/index the dataframe with 'x'     x[match(min(res$best[x]), res$best[x])] })
# to get a vector of the indices, do an 'unlist' min.vector <- unlist(min.list)

On 8/10/06, Max Manfrin <mmanfrin@ulb.ac.be> wrote:
>
> On 10 Aug 2006, at 18:46, jim holtman wrote:
>
> > It appears that you are trying to partition the dataframe and then
> > do some operations. It is probably better to use 'split' to
> > generate the set of indices of the partitions and then do the
> > operations on the subset. Here is an example that calculate the
> > 'mean' of each partition:
> >
> > > n <- 20
> > > x <- data.frame(id=sample(1:3,n,TRUE), type=sample(1:3,n,TRUE),
> > value=runif(n))
> > > x.split <- split(1:nrow(x), list(x$id, x$type), drop=TRUE)
> > > x.split
> > $`3.1`
> > [1] 1 15 19
> >
> > $`1.1`
> > [1] 2
> ... cut ...
>
> > > # calculate the number of values in the partition and their mean
> >
> > > lapply(x.split, function(z) c(length(z),mean(x$value[z])))
> > $`3.1`
> > [1] 3.0000000 0.3120459
> >
> > $`1.1`
> > [1] 1.0000000 0.5642638
> ... cut ...
> > You should be able to extend this approach to your data.
>
> I tried to follow your suggestion. I indeed have to partition the
> data frame: my complete set of data contains for each problem
> instance ("instance") of a given size (the number of instances of a
> given size in the example is 2), for each search algorithm ("idalgo")
> (the number of algorithm I'm testing is 78), for each trial ("try")
> (I test each algorithm on each instance 30 times) all the best-so-far
> solutions value ("best") found by every CPU (my parallel algorithm
> runs on 8 CPU) during the duration of the search.
>
> I therefore applied to the res data frame the command
> >res.split <- split(res, list(res$instance, res$try, res$idalgo),
> drop=TRUE)
>
> For every partition (and I have 4680 partition of the type
> instance.try.idalgo) I need to identify the best solution found (so,
> among the 8 CPU I need to identify the one with the lowest value of
> "best"). Unluckly the split command doesn't give me back the indexes
> of the row of res data frame like in your example, but gives me a
> "subset" of the res, so I don't know how to write the lapply function
> to return the indexes of the rows in res containing the minimum value
> of best for the partitions.
>
>
> I here give an example with a subset of the data:
>
> > optimal_values<-read.table("optimal_values_80.txt",header=TRUE)
> > resPIR2OPT<-read.table("parallel_independent_2-
> opt_80_800.txt",header=TRUE)
> > resSEQ2OPT<-read.table("sequential_2-opt_80_6400.txt",header=TRUE)
> > resSEQ22OPT<-read.table("sequential2_2-opt_80_800.txt",header=TRUE)
> >
> > res<-rbind(resPIR2OPT,resSEQ2OPT,resSEQ22OPT)
> > str(res)
> `data.frame': 14774 obs. of 11 variables:
> $ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1 1 1 1
> 1 1 1 1 ...
> $ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1
> 1 ...
> $ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1 1 1 1
> 1 ...
> $ ls : int 2 2 2 2 2 2 2 2 2 2 ...
> $ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1 1 ...
> $ cpu_id : int 0 0 0 0 0 0 0 0 0 0 ...
> $ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1 1 1 1
> 1 ...
> $ try : int 1 1 1 1 1 1 1 1 1 1 ...
> $ best : int 255289 255250 255209 255112 254991 254971 254969
> 254897 254893 254892 ...
> $ time : num 0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72 1.78 1.93 ...
> $ iteration: int 1 1 1 2 13 18 19 22 23 26 ...
> > res.split <- split(res, list(res$instance, res$try, res$idalgo),
> drop=TRUE)
> > str(res.split)
> List of 180
> $ lipa80a.1.PIR-2opt :`data.frame': 184 obs. of 11 variables:
> ..$ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1
> 1 1 1 1 1 1 1 ...
> ..$ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ ls : int [1:184] 2 2 2 2 2 2 2 2 2 2 ...
> ..$ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1
> 1 ...
> ..$ cpu_id : int [1:184] 0 0 0 0 0 0 0 0 0 0 ...
> ..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ try : int [1:184] 1 1 1 1 1 1 1 1 1 1 ...
> ..$ best : int [1:184] 255289 255250 255209 255112 254991
> 254971 254969 254897 254893 254892 ...
> ..$ time : num [1:184] 0.09 0.09 0.09 0.19 1.16 1.49 1.55 1.72
> 1.78 1.93 ...
> ..$ iteration: int [1:184] 1 1 1 2 13 18 19 22 23 26 ...
> $ lipa80a.2.PIR-2opt :`data.frame': 230 obs. of 11 variables:
> ..$ idalgo : Factor w/ 3 levels "PIR-2opt","SEQ-2opt",..: 1 1 1
> 1 1 1 1 1 1 1 ...
> ..$ topo : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ schema : Factor w/ 3 levels "PIR","SEQ","SEQ2": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ ls : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
> ..$ type : Factor w/ 2 levels "Par","Seq": 1 1 1 1 1 1 1 1 1
> 1 ...
> ..$ cpu_id : int [1:230] 0 0 0 0 0 0 0 0 0 0 ...
> ..$ instance : Factor w/ 2 levels "lipa80a","tai80a": 1 1 1 1 1 1
> 1 1 1 1 ...
> ..$ try : int [1:230] 2 2 2 2 2 2 2 2 2 2 ...
> ..$ best : int [1:230] 255557 255264 255235 255201 255193
> 255192 255186 255103 254990 254971 ...
> ..$ time : num [1:230] 0.09 0.09 0.19 0.19 0.37 1.29 1.36 1.36
> 1.58 1.89 ...
> ..$ iteration: int [1:230] 1 1 2 2 4 15 16 16 19 24 ...
>
>
> My question now is: how do I extract from each partition the row with
> the minimal best value? I need to boxplot them.
>
> Thanks again in advance for any help anybody could give.
>
> ----
> Max MANFRIN
> http://iridia.ulb.ac.be/~mmanfrin/
>
>
>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri Aug 11 20:48:42 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 12 Aug 2006 - 08:20:06 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.