Re: [Rd] recursive data-structures in R - An S4 "node" Class

From: rt <taylor.russ_at_gmail.com>
Date: Sun, 24 Jan 2010 15:26:44 -0600

Hi Martin,

Thanks for your detailed response. Just examining your code written in the context of my specific problem was helpful. Few thoughts and questions:

On Sun, Jan 24, 2010 at 11:40 AM, Martin Morgan <mtmorgan_at_fhcrc.org> wrote:

> Hi Russ -- some ideas below...
>
> On 01/23/2010 06:18 PM, rt wrote:
> > Hi,
> >
> > In an effort to learn S4 objects, I am trying to port some c based tree
> > code to S4 object. My immediate goal is to use .Call() interface for
> > calling my c code from R. My long term goal is to understand how to write
> c
> > structs that follows S4 classes and not the other-way-around.
> >
> > The c struct for the node is :
> > typedef struct node{
> > int c;
> > int n;
> > inode **nodes; //length = n
> > } inode;
> >
> > I am trying to define corresponding S4 class. The closest I could come
> was:
> > setClassUnion("listOrNULL",members=c("list", "NULL"))
> > setClass(Class = "Inode", representation(c =
> > "numeric",n="numeric",nodes="listOrNULL"))
> >
> > This does not work, as setting an element of the list of nodes in a node
> > object overwrites the original object. I have read that there are dirty
> ways
> > of writing code that gets around this issue. For example, using the
> assign()
> > operator.
> > I am not even sure if it will work.
> >
> > I would appreciate if someone suggest ways in which this problem is best
> > approached.
> > Is it possible to have recursive data-structures within R ?
>
> Maybe for your C structure I'd have started with
>
> setClass("INodeList", contains="list")
> setValidity("INodeList", function(object) {
> ok <- sapply(object, is, "INode")
> if (!all(ok)) "'INodeList' elements must all be 'INode'"
> else TRUE
>
> })
>
> setClass("INode",
> representation=representation(
> c="integer", n="integer", nodes="INodeList"))
>
> and then
>
> > tree <- new("INode", n=2L,
> nodes=new("INodeList", list(
> new("INode"),
> new("INode", n=1L,
> nodes=new("INodeList", list(new("INode")))))))
>
> creates a recursive structure. One might update with
>
> > anode <- new("INode", n=1L,
> nodes=new("INodeList", list(new("INode", c=1L, n=0L))))
> > tree_at_nodes[[2]]@nodes[[1]] <- anode
>

I imagine, this makes a copy of anode.

>
> or write a recursive procedure (e.g., to count the number of nodes)
>
> setGeneric("icount", function(x) standardGeneric("icount"))
> setMethod(icount, "INode", function(x) {
> res <- sapply(x_at_nodes, icount)
> if (length(res) == 0) 1 else sum(res) + 1
> })
>
> > icount(tree)
> [1] 5
>
> or maybe if 'c' were meant to be the number of nodes below the current
>
> setGeneric("icumm", function(x) standardGeneric("icumm"))
> setMethod("icumm", "INode", function(x) {
> if (length(x_at_nodes) == 0) new("INode", x, c=1L)
> else {
> nodes = new("INodeList", lapply(x_at_nodes, icumm))
> c <- 1L + sum(sapply(nodes, slot, "c"))
> new("INode", x, c=c, nodes=nodes)
> }
> })
>
> > t1 <- icumm(tree)
> > t1_at_c
> [1] 5
> > t1_at_nodes[[1]]@c
> [1] 1
> > t1_at_nodes[[2]]@c
> [1] 3
>
> All of this says 'yes, recursive data structures are possible within [S4
> classes in] R'.
>
> But something like icumm makes a full copy of 'tree' (probably several
> more!, actually, thinking about it, probably many more since the last
> new("INodes", ...) likely recopies all the nodes below the current;
> rewriting this to reduce the copying [a revised algorithm, rather than
> clever R tricks or jumping to C [as ?rapply does] would be an
> interesting exercise for the algorithmically inclined reader), and the
> result is a different object, rather than an updated 'tree'. This is the
> copy-on-change semantics of R, and you'll not get around it easily (the
> usual approach is to use an environment, but then your R user accustomed
> to copy-on-change is surprised with action-at-a-distance, where say
> tree2 <- tree and changing 'tree' unexpectedly alters the value of
> 'tree2'). Another interesting exercise for the reader is to use standard
> OOP patterns to replace the conditional code with method dispatch, e.g.,
> with a derived class NullINode rather than the R-centric classUnion.
>

>From your description, it appears that a reasonable solution would be:
(a) Make semantics of any function explicit and clear (b) Have separate functions for copyInode(x), vs. aliasInode(x) with aliasInode(x) implemented using assign( env=parent.env())??

In this case, I am not sure how to implement a replace method. Is it possible to define "[<-" with an option copy_or_alias? Are the problems associated with aliasing and greater in an interpreted language?
The main reason for doing this would be to minimize copying.

>
> > What is the best way of writing an interface with pre-existing c code?
>
> 'Best' is a strong word. If you're trying to capture data structures
> that have well-developed C-level support, then perhaps what you want is
> to return an 'ExternalPtr'; your S4 class would have a slot containing
> the ExternalPtr, and any access is via methods that end up doing .Call
> to a C-level manipulation on the pointer's content. You have some
> responsibility to make sure that the object is managed appropriately
> (e.g., with a finalizer that frees memory when the ExternalPtr is no
> longer referenced in R). You also need to protect or prepare your user
> for the action-at-distance problems mentioned above.
>
>
I already have considerable amount of c code involving recursive structures that I would like to call from R.
It seems that ExternalPtr is an easier way of doing the job than tryong to create any other type of S4 class.
As you point out, I will need to think carefully about how the objects are created/accessed/modified/destroyed.
I do have functions analogous to newInode(), copyInode(), aliasInode(), addInode() freeInode(), killInode().
I am still not clear if it would be possible to write standard S4 methods particularly replaceMethods
that can accomplish common tasks such as copy/add/alias/replace etc.

> Hope that helps.
>
> Martin
>
> > What are the best practices for deveolping new code in c for such
> purposes?
> >
> > Thanks,
> >
> > Russ
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

Any pointers to accessible examples of interfacing ExternalPtrs and S4 objects would be appreciated.
Once again, thank for your detailed answer to my questions. I can see the benefits of using R, but getting there seems almost tougher than coding everything in c.

Best Regards,

Russ

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 24 Jan 2010 - 21:31:13 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Jan 2010 - 15:50:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive