[Rd] Adressing Problems: R with Fortran and OpenMP

From: Lars Wi▀ler <jahftw_at_googlemail.com>
Date: Mon, 08 Aug 2011 18:44:51 +0200


Hello,

I am programming an R program with nested Fortran calls for calculations and OpenMP for parallelization. I am getting a changing error corresponding to memory addressing problems, when using a 64-bit system. Using a 32-bit System the application runs without problems. The errors on 64-bit range from null-pointer failures, over segmentation faults, over stack imbalances (changing differences and I am not using C/C++) to finishing without exception but with wrong values. Sometimes it even works correctly on 64-bit, mostly when executing a second time within the same R session. Sometimes an endless loop "Error: bad target context--should NEVER happen; please bug.report() [R_run_onexits]" appears.

The problem seems to be platform independent. I have tried windows 7, windows vista and open suse 11.3. (x86-64). Evaluation with valgrid reveals a major possible memory leak, though the leak appears on 32-bit systems as well, just no errors. I am using a gfortran 4.5.0 x86-64 compiler and R version 2.12.

valgrid log extract:
==22989== 25,559,200 bytes in 4 blocks are possibly lost in loss
record 5,678 of 5,678
==22989== at 0x4C26C3A: malloc (in

/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==22989== by 0x4F39907: Rf_allocVector (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4EDAF96: duplicate1 (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4FC204E: R_subassign3_dflt (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4FC24A2: do_subassign3 (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F0346F: do_set (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F02EB1: applydefine (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F035EB: do_begin (in /usr/lib64/R/lib/libR.so)
==22989== by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989==
==22989== LEAK SUMMARY:
==22989== definitely lost: 82 bytes in 1 blocks
==22989== indirectly lost: 0 bytes in 0 blocks
==22989== possibly lost: 109,720,966 bytes in 26,330 blocks
==22989== still reachable: 23,101,045 bytes in 5,105 blocks
==22989== suppressed: 0 bytes in 0 blocks

All pointers in Fortran are explicitly defined with integer*4 and real*8 as double.

I am really lost in this, because i just dont know where to start and stop looking. It is obvious to me, that there is some kind of memory adressing problem related to 64-bit architecture but since I dont know if its related to R or Fortran or OpenMp or a combination of those, it is very hard to find. Also the program is part of a library with 40+ files which interact, so I it would be really hard and time consuming to cut the program down to a size, where the error will be reproduced and still managable.

Any help, ideas, suggestions as to what to do, where to look and what to try would be very welcome. I have been trying to solve this problem for nearly two weeks and read everything I could find regarding x86-64, R, Fortran, OpenMP and memory issues. I could post more and more specific information regarding the errors, but then the description would get even bigger. So if I need to supply more information, please tell me and I will do so.

Regards
Lars

Following are the code snippets for the Fortran call and the entrance to the Fortran program with OpenMp definition. If the program fails with an statement about where it failed (i.e. segmentation fault), then it gives this call as place. But since I only get R errors and not Fortran errors, the error might actually occur anywhere in Fortran.

 z <- .Fortran("nlrdtirg",

                as.integer(si),
                as.integer(ngrad),
                as.integer(ddim[1]),
                as.integer(ddim[2]),
                as.integer(ddim[3]),
                as.logical(mask),
                as.double(object_at_btb),
                as.double(sdcoef),
                th0=as.double(s0),
                D=double(6*prod(ddim)),
                as.integer(200),
                as.double(1e-6),
                res=double(ngrad*prod(ddim)),
                rss=double(prod(ddim)),
                double(ngrad*num_threads),
				as.integer(num_threads),
                PACKAGE="dti",DUP=TRUE)


     subroutine nlrdtirg(s,nb,n1,n2,n3,mask,b,sdcoef,th0,D,niter,eps,
     1                    res,rss,varinv,nt)

      use omp_lib
      implicit logical*4 (a-z)
      integer*4 nb,n1,n2,n3,s(nb,n1,n2,n3),niter,nt,tid
      logical mask(n1,n2,n3)
      real*8 D(6,n1,n2,n3),b(6,nb),res(nb,n1,n2,n3),
     1    th0(n1,n2,n3),eps,rss(n1,n2,n3),sdcoef(4),varinv(nt*nb)
      integer*4 i1,i2,i3,j

      DO i3=1,n3
         DO i2=1,n2
C$OMP PARALLEL DEFAULT(NONE)
C$OMP& SHARED(mask,s,b,sdcoef,th0,D,res,rss,varinv,nb,niter,eps)
C$OMP& FIRSTPRIVATE(i2,i3,n1)

C$OMP& PRIVATE(i1,j,tid)
C$OMP DO SCHEDULE(DYNAMIC,1)

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 08 Aug 2011 - 16:49:24 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 08 Aug 2011 - 19:50:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive