From: Kasper Daniel Hansen <k.hansen_at_biostat.ku.dk>

Date: Sat 17 Jul 2004 - 05:59:57 EST

Date: Sat 17 Jul 2004 - 05:59:57 EST

Looking at a practical example, which basically consists of a model search, with each model being evaluated by a nnet fit, I get (using the default value of interval=0.02 - the output is basically similar for interval = 0.01)

> prof02.out$by.total[1:10,]

total.time total.pct self.time self.pct cv.risk.dsa2 2373.50 100.0 0.00 0.0 validation.risk.dsa2 2373.50 100.0 0.00 0.0 avg.pred.nnet 2370.98 99.9 4.36 1.5 dsa.2 2337.32 98.5 0.00 0.0 best.move.dsa2 2335.22 98.4 0.06 0.0 avg.nnet 2335.10 98.4 0.04 0.0 nnet 2188.22 92.2 1.74 0.6 nnet.default 2186.48 92.1 33.32 11.2 sub.dsa2 1466.32 61.8 0.18 0.1 del.dsa2 560.36 23.6 0.02 0.0

This tells me that the majority (92%) is spent in the nnet fit. Therefore it seems that the only real improvement to the code would be to change the algorithm in order to call nnet fewer times. The actual search code only amounts to a little amout of the total time.

Looking at sampling.time, the fourth column (total.time in percent) is the third column (total.time) divided by sampling.time (which makes sense).

My trouble really begins when I look at the self.time column.

> prof02.out$by.self[1:10,]

self.time self.pct total.time total.pct nnet.default 33.32 11.2 2186.48 92.1 FUN 23.42 7.8 46.34 2.0 as.integer 22.28 7.5 30.92 1.3 mean.default 16.14 5.4 22.92 1.0 as.double.default 11.52 3.9 11.52 0.5 apply 10.22 3.4 60.80 2.6 as.double 10.14 3.4 30.24 1.3 rep.default 9.98 3.3 19.96 0.8 matrix 9.62 3.2 75.86 3.2 add.net 8.80 2.9 10.16 0.4

First of, I would guess that the sum of the self.time column
_ought_??? to be equal to the sampling time, which it is not

> sum(prof02.out$by.self[,1])

[1] 298.4

> prof02.out$sampling.time

[1] 2373.5

Why does such a large discrepancy happen?

Secondly, looking at the output for as.integer, there is a big difference between total.time and self.time - but I would _guess_ as.integer to be a pretty basic function, which does not call any subfunctions, and in that case I would expect total.time to be equal to self.time.

In short : I think I grasp the total time concept, but I have a hard time understanding self.time. Specifically

- Why is the sampling time so different from the sum of the self.time
- Which function calls (or operations) are considered to belong to a given function's self.time and which are not.
- Why is total.time != self.time for the as.integer function.

I guess some of the answers would be clear if I had a firm grasp of the inner workings of R :)

Thanks in advance, Kasper

-- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen ______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Sat Jul 17 06:12:24 2004

*
This archive was generated by hypermail 2.1.8
: Fri 18 Mar 2005 - 02:36:40 EST
*