[Rd] Fixing the HDF5 package: the on.exit mystery

From: H C Pumphrey <H.C.Pumphrey_at_ed.ac.uk>
Date: Fri, 04 Mar 2011 10:35:08 +0000


Dear all,

I'm trying to fix a subtle bug in the hdf5 package. This package provides an interfaces to the HDF5 library and hence allows one to load data into R from files in the HDF5 format. The bug appeared during a period in which R changed but the package did not.

I include below both the R and C code, stripped of everything except what is needed to show the bug. What is supposed to happen is

(*) the user calls R function hdf5load()
(*) hdf5load() calls C function do_hdf5load()
(*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid)
(*) do_hdf5load() calls C function setup_onexit, passing fid to it
(*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with
fid as its argument
(*) C function do_hdf5load() walks the HDF5 file's tree structure of groups
of groups of [...] of datasets, mapping them to an R list of lists of [...] of array variables. This recursive procedure may have a variety of exit points buried inside itself.
(*) C function do_hdf5load() exits for some reason. R function hdf5load()
therefore exits but before doing so it calls its on.exit code (which is hdf5cleanup(fid) with the right value of fid), closing the file.

The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is usually not called, meaning that the file is left open. You might not notice this, but if you are processing a few year's worth of data, which is stored at 1 file per day, you may end up with the system limit number of files open and be unable to open any more.

I have a suspicion that the problem dates to a change in R at 2.8.0. If you do   help(on.exit) it notes under "Details" that: "Where ‘expr’ was evaluated changed in R 2.8.0 ..." But it is not clear how I should modify the C code to force hdf5cleanup() to be reliably called when do_hdf5load() exits.

Any help appreciated.

Hugh (possibly the nearest thing to a maintainer that the hdf5 package currently has)

(R and C code follow)

#----------------------------------------------------------------
"hdf5load" <- function (file, load = TRUE, verbosity = 0, tidy = FALSE) {

   call <- sys.call()
   .External("do_hdf5load", call, sys.frame(sys.parent()), file, load,

             as.integer (verbosity), as.logical(tidy),
             PACKAGE="hdf5")

}

"hdf5cleanup" <- function (fid)
{

   call <- sys.call()
   print("In hdf5cleanup: calling do_hdf5cleanup")    invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid,

             PACKAGE="hdf5"))
}

#----------------------------------------------------------------


/*---------------------------------------------------------------*/
SEXP do_hdf5load (SEXP args)
{
/* Code to process args snipped */
  if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0)

     errorcall (call, "unable to open HDF file: %s", path);

   setup_onexit (fid, env);
   /* Messy code to walk tree structure of file snipped */ }

/* The following function shown in its entirety */ setup_onexit (hid_t fid, SEXP env)
{

   eval (lang2 (install ("on.exit"),

                lang2 (install ("hdf5cleanup"),
                       ScalarInteger (fid))),
         env);

}

SEXP
do_hdf5cleanup (SEXP args)
{
/* Code to process args snipped */
/* various cleanup things done including this: */ H5Fclose(fid)
}

/*---------------------------------------------------------------*/

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 04 Mar 2011 - 10:47:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 Mar 2011 - 08:20:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive