Re: [Rd] Pipe / Fork: Partial Solution / Providing Connections from C?

From: Jan T. Kim <jtk_at_cmp.uea.ac.uk>
Date: Sat 12 Feb 2005 - 07:42:43 EST

On Fri, Feb 11, 2005 at 02:32:20PM +0100, Peter Dalgaard wrote:
> "Jan T. Kim" <jtk@cmp.uea.ac.uk> writes:
>
> > > Well, that is probably reasonably easy, but (not the least due to that
> > > fact) I'm still surprised that it has not been done already. I can hardly
> > > imagine that I'm the first one to want to use some external utility from
> > > an R program in this way.
> > >
> > > So, what do you R-devel folks do in this case, and what would you
> > > recommend?
> >
> > I'm still curious about this one. If there really is no way of running
> > stuff through external filter processes in R, I'd volunteer to add
> > that.
> >
> > Best regards & thanks in advance, Jan
>
> If you know how, please do. I have a suspicion it might not be as easy
> as it sounds because of the producer/consumer aspects. Notice, though,
> that in most cases you can get by with system() or pipe() and a
> temporary file for either the input or the output.

Personally, I see filtering as a process, and the sequence of collecting input in a file, then filtering that into an output file, then reading that and carrying on with it as a more complex process that involves filtering as a part of it. Additional complexity means that there's more that can go wrong, which is why I dislike temporary files.

Specifically. I've seen it happen too often (including to myself) that things went wrong because other processes were interfering with the temporary files (in most cases, other processes running the same program).

> I remember speculating about these matters when I was first introduced
> to pipes in C: They'd show you how to open a pipe for reading and how
> to do it for writing, but not how to do both with the same process.
> Took me a while to realize that there is a nontrivial deadlock issue
> if you try to write to a process that itself is blocked trying to
> write its output. Now that is of course not to say that it cannot be
> done with clever multiplexing and buffering techniques -- or
> multithreading, except that R isn't threaded.

It's clear to me that for real dynamic filtering, you need two processes (or threads). This requires that the operating system supports forking, i.e. that the fork package works. Without that, filtering is not possible, at least I'm not in any way I'm aware of.

So, my plan would be to add some function to src/main/connections.c for setting up a pipe running through an external command and returning the write and read connections for use in the R program. Then, one could do something like (modelled after the pipe example in the base docs):

    library(fork);
    data2 <- c(

      "450, 390, 467, 654,  30, 542, 334, 432, 421,",
      "357, 497, 493, 550, 549, 467, 575, 578, 342,",
      "446, 547, 534, 495, 979, 479");

    fp <- filterpipe("sed -e s/,$//");
    {
      pid <- fork(slave = NULL)
      if (pid == 0)
      {
        close(fp$read);
	write(data2, file = fp$write);
	close(fp$write);
	exit();
      }
      else
      {
        close(fp$write);
	x <- scan(fp$read);
	close(fp$read);
	wait(pid);
      }

    }

Thinking about your buffering suggestion, it occurs to me that it *may* be possible to create two anonymous files (of the file("") type) and to connect these to the stdin and the stdout of an external process. In fact, a couple of days ago I checked whether pipe() would perhaps accept optional file arguments for specifying the external process' stdin and stdout, so I could e.g.

    f <- file("");
    p <- pipe("sed -e s/,$//", stdin = f);     write(data2, file = f);
    scan(p);

but that turned out to be another detour on the way that took me here...

Best regards, Jan

-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk@cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat Feb 12 05:52:43 2005

This archive was generated by hypermail 2.1.8 : Sat 12 Feb 2005 - 06:27:44 EST