Re: R-alpha: linux bug

Ross Ihaka (ihaka@stat.auckland.ac.nz)
Wed, 13 Nov 1996 13:24:32 +1300 (NZDT)


From: Ross Ihaka <ihaka@stat.auckland.ac.nz>
Date: Wed, 13 Nov 1996 13:24:32 +1300 (NZDT)
Message-Id: <199611130024.NAA27204@stat13.stat.auckland.ac.nz>
To: Peter Dalgaard BSA <pd@kubism.ku.dk>
Subject: Re: R-alpha: linux bug
In-Reply-To: <x2d8xk31wi.fsf@bush.kubism.ku.dk>
 <x2d8xk31wi.fsf@bush.kubism.ku.dk>

Peter Dalgaard writes:
 > Thomas Lumley <thomas@biostat.washington.edu> writes:
 > 
 > > 
 > > 
 > > When I call certain dynamically loaded C routines from R 0.13 under linux
 > > I get a floating point exception.  The exception happens before the first
 > > line of the C routine, and in or after the line of code in dotcode.c that
 > > calls the C routine ie
 > > 	fun(cargs[0],cargs[1],...)
 > > 
 > > It does not happen with all routines in a given library. It does not
 > > always happen (only 90-95% of the time). It doesn't matter whether this
 > > routine is the first foreign routine callled. Changing the name of the
 > > C function doesn't help.  The identical dynamic load library and R code
 > > works under R0.12.  The same C and R code works when compiled and run
 > > under SunOS 4.1 with the same version of gcc. 
 > > 
 > > I am using Linux kernel 1.2.13 and gcc 2.7.1.  One example of the problem is 
 > > the "coxfit2" routine called by coxph.fit in the survival4 library. The 
 > > error occurs with both the version of the C code in my original port and 
 > > with a slightly altered newer version. For comparison, the "agexact" and 
 > > "coxmart" routines also called by coxph.fit work properly.
 > > 
 > > 
 > > Any ideas?
 > 
 > Does it go away if you don't optimize (either dotcode.c or the
 > dyn.loadable routine)? It's pretty weird, because floating point
 > doesn't seem to be used anywhere near the indicated point. 
 > 
 > It could be a compiler error. Your gcc is a tad old, mine is 2.7.2 and
 > I believe 2.7.2.1 is out.

I suspect that there is a memory management bug lurking in dotcode.c
somewhere.  These are the nastiest bugs to track down because they
only show up when there is a garbage collection and a pointer has not
been protected from the collector - i.e. seemingly at random.  I will
go line by line through the code, but if anyone finds a way of
reproducing this failure it would really help to localize the problem.

(I have to admit to being amazed that this kind of problem shows up as
rarely as it does.  It must be all that fresh-air we breath and the
clean living we do here.)
	Ross
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- To (un)subscribe, send
subscribe	or	unsubscribe
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-