Re: [Rd] RFC: "loop connections"

From: <dhinds_at_sonic.net>
Date: Sat 27 Aug 2005 - 20:11:57 GMT

Martin Maechler <maechler@stat.math.ethz.ch> wrote:

> In the mean time, I think it has become clear that
> "loopconnection" isn't necessarily a better name, and that
> textConnection() has been there in "the S litterature" for a
> good reason and for quite a while.
> Let's forget about the naming and the exact UI for the moment.

That is entirely fine with me.

> I think the main point of David's proposal is still worth
> consideration: One way to see text connections is as a way to
> treat some kind of R objects as "generalized files" i.e., connections.
> And AFAICS David proposes to enlarge the kind of R objects that
> can be dealt with as connections
> from {"character"}
> to {"character", "raw"}
> something which has some appeal to me.
> IIUC, Brian Ripley is doubting the potential use for the
> proposed generalization, whereas David makes a point of someone
> else (the 'caTools' author) having written raw2bin / bin2raw function
> for a related use case.

> Maybe you can elaborate on the above a bit, David?

I'm not sure what more can be said on the subject. Most connection types support both text-mode and binary-mode, so this is partly a proposal for symmetry and consistency. Prof. Ripley is correct that binary anonymous connections provide overlapping functionality, but the semantics are slightly different, and performance is different. I don't see an advantage for having the "text-like" connection only support text access.

I ran some quick benchmarks on three implementations, where the task was conversion back and forth between a numeric vector of length 1000, and a packed raw vector of single precision floats, repeated 1000 times. The first method uses a new anonymous connection for each transformation. The second reuses a single anonymous connection. The third uses a new raw textConnection for each transformation.

  usr sys elapsed

  1.5  9.5   14.6    anonymous
  1.1  0.1    1.2    persistent
  0.9  0.0    0.9    raw

Setting up and tearing down anonymous connections is very slow (at least on Windows) because it requires substantial OS intervention. If a program can be easily organized so that a single connection can be used, performance is much better.

I would appreciate feedback on how to improve raw_write() for the case of appending to an existing vector. Is it possible to reserve free space at the end of a vector for appending? I see that there is a distinction between LENGTH() and TRUELENGTH() but I'm not sure if this is the intended use.

> In any case, as you might have guessed by now, R-core would have
> been more positive to a proposal to generalize current
> textConnection() - fully back-compatibly - rather than renaming
> it first.

I have no interest in sacrificing back compatibility; I did intend that there would always be a textConnection() entry point, if only as a wrapper for the new constructor. The only reason for a new name (and I'm certainly open to suggestions) is because the notion of a binary or raw textConnection seemed wrong.


R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Aug 28 06:28:06 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:19 GMT