[R] interpreting profiling output

From: Kasper Daniel Hansen <k.hansen_at_biostat.ku.dk>
Date: Sat 17 Jul 2004 - 05:59:57 EST


I have some trouble interpreting the output from profiling. I have read the help pages Rprof, summaryRprof and consult the R extensions manual, but I still have problems understanding the output.

Basically the output consist of self.time and total.time. I have the understanding that total.time is the time spent in a given function including any subcalls or child functions or whatever the technical term for that activity is. In contrasts self.time is the time spent in the function excluding subcalls.

Now, in my understanding basically everything in R is functions. I would then guess that for almost any function (except the "atomic ones") the self.time would be very small as it would spend most of its time calling other functions (again, since almost everything is a function). So how do R determine when a subfunction is called?

Looking at a practical example, which basically consists of a model search, with each model being evaluated by a nnet fit, I get (using the default value of interval=0.02 - the output is basically similar for interval = 0.01)

> prof02.out$by.total[1:10,]

                     total.time total.pct self.time self.pct
cv.risk.dsa2            2373.50     100.0      0.00      0.0
validation.risk.dsa2    2373.50     100.0      0.00      0.0
avg.pred.nnet           2370.98      99.9      4.36      1.5
dsa.2                   2337.32      98.5      0.00      0.0
best.move.dsa2          2335.22      98.4      0.06      0.0
avg.nnet                2335.10      98.4      0.04      0.0
nnet                    2188.22      92.2      1.74      0.6
nnet.default            2186.48      92.1     33.32     11.2
sub.dsa2                1466.32      61.8      0.18      0.1
del.dsa2                 560.36      23.6      0.02      0.0

This tells me that the majority (92%) is spent in the nnet fit. Therefore it seems that the only real improvement to the code would be to change the algorithm in order to call nnet fewer times. The actual search code only amounts to a little amout of the total time.

Looking at sampling.time, the fourth column (total.time in percent) is the third column (total.time) divided by sampling.time (which makes sense).

My trouble really begins when I look at the self.time column.

> prof02.out$by.self[1:10,]

                  self.time self.pct total.time total.pct
nnet.default          33.32     11.2    2186.48      92.1
FUN                   23.42      7.8      46.34       2.0
as.integer            22.28      7.5      30.92       1.3
mean.default          16.14      5.4      22.92       1.0
as.double.default     11.52      3.9      11.52       0.5
apply                 10.22      3.4      60.80       2.6
as.double             10.14      3.4      30.24       1.3
rep.default            9.98      3.3      19.96       0.8
matrix                 9.62      3.2      75.86       3.2
add.net                8.80      2.9      10.16       0.4

First of, I would guess that the sum of the self.time column _ought_??? to be equal to the sampling time, which it is not
> sum(prof02.out$by.self[,1])

[1] 298.4
> prof02.out$sampling.time

[1] 2373.5
Why does such a large discrepancy happen?

Secondly, looking at the output for as.integer, there is a big difference between total.time and self.time - but I would _guess_ as.integer to be a pretty basic function, which does not call any subfunctions, and in that case I would expect total.time to be equal to self.time.

In short : I think I grasp the total time concept, but I have a hard time understanding self.time. Specifically

I guess some of the answers would be clear if I had a firm grasp of the inner workings of R :)

Thanks in advance, Kasper

-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Jul 17 06:12:24 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 02:36:40 EST