Re: [Rd] recursive data-structures in R - An S4 "node" Class

From: Martin Morgan <mtmorgan_at_fhcrc.org>
Date: Sun, 24 Jan 2010 09:40:06 -0800

Hi Russ -- some ideas below...

On 01/23/2010 06:18 PM, rt wrote:
> Hi,
>
> In an effort to learn S4 objects, I am trying to port some c based tree
> code to S4 object. My immediate goal is to use .Call() interface for
> calling my c code from R. My long term goal is to understand how to write c
> structs that follows S4 classes and not the other-way-around.
>
> The c struct for the node is :
> typedef struct node{
> int c;
> int n;
> inode **nodes; //length = n
> } inode;
>
> I am trying to define corresponding S4 class. The closest I could come was:
> setClassUnion("listOrNULL",members=c("list", "NULL"))
> setClass(Class = "Inode", representation(c =
> "numeric",n="numeric",nodes="listOrNULL"))
>
> This does not work, as setting an element of the list of nodes in a node
> object overwrites the original object. I have read that there are dirty ways
> of writing code that gets around this issue. For example, using the assign()
> operator.
> I am not even sure if it will work.
>
> I would appreciate if someone suggest ways in which this problem is best
> approached.
> Is it possible to have recursive data-structures within R ?

Maybe for your C structure I'd have started with

  setClass("INodeList", contains="list")   setValidity("INodeList", function(object) {

      ok <- sapply(object, is, "INode")
      if (!all(ok)) "'INodeList' elements must all be 'INode'"
      else TRUE

  })

  setClass("INode",

           representation=representation(
             c="integer", n="integer", nodes="INodeList"))

and then

> tree <- new("INode", n=2L,

             nodes=new("INodeList", list(
                new("INode"),
                new("INode", n=1L,
                    nodes=new("INodeList", list(new("INode")))))))

creates a recursive structure. One might update with

> anode <- new("INode", n=1L,

               nodes=new("INodeList", list(new("INode", c=1L, n=0L))))
> tree_at_nodes[[2]]@nodes[[1]] <- anode

or write a recursive procedure (e.g., to count the number of nodes)

  setGeneric("icount", function(x) standardGeneric("icount"))   setMethod(icount, "INode", function(x) {

      res <- sapply(x_at_nodes, icount)
      if (length(res) == 0) 1 else sum(res) + 1
  })

> icount(tree)

[1] 5

or maybe if 'c' were meant to be the number of nodes below the current

  setGeneric("icumm", function(x) standardGeneric("icumm"))   setMethod("icumm", "INode", function(x) {

      if (length(x_at_nodes) == 0) new("INode", x, c=1L)
      else {
          nodes = new("INodeList", lapply(x_at_nodes, icumm))
          c <- 1L + sum(sapply(nodes, slot, "c"))
          new("INode", x, c=c, nodes=nodes)
      }

  })

> t1 <- icumm(tree)
> t1_at_c

[1] 5
> t1_at_nodes[[1]]@c

[1] 1
> t1_at_nodes[[2]]@c

[1] 3

All of this says 'yes, recursive data structures are possible within [S4 classes in] R'.

But something like icumm makes a full copy of 'tree' (probably several more!, actually, thinking about it, probably many more since the last new("INodes", ...) likely recopies all the nodes below the current; rewriting this to reduce the copying [a revised algorithm, rather than clever R tricks or jumping to C [as ?rapply does] would be an interesting exercise for the algorithmically inclined reader), and the result is a different object, rather than an updated 'tree'. This is the copy-on-change semantics of R, and you'll not get around it easily (the usual approach is to use an environment, but then your R user accustomed to copy-on-change is surprised with action-at-a-distance, where say tree2 <- tree and changing 'tree' unexpectedly alters the value of 'tree2'). Another interesting exercise for the reader is to use standard OOP patterns to replace the conditional code with method dispatch, e.g., with a derived class NullINode rather than the R-centric classUnion.

> What is the best way of writing an interface with pre-existing c code?

'Best' is a strong word. If you're trying to capture data structures that have well-developed C-level support, then perhaps what you want is to return an 'ExternalPtr'; your S4 class would have a slot containing the ExternalPtr, and any access is via methods that end up doing .Call to a C-level manipulation on the pointer's content. You have some responsibility to make sure that the object is managed appropriately (e.g., with a finalizer that frees memory when the ExternalPtr is no longer referenced in R). You also need to protect or prepare your user for the action-at-distance problems mentioned above.

ame="0216qlink4">Hope that helps.

Martin

> What are the best practices for deveolping new code in c for such purposes?
>
> Thanks,
>
> Russ
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 24 Jan 2010 - 17:42:11 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Jan 2010 - 07:00:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive