Re: [Rd] R strings, null-terminated or size delimited?

From: Guillaume Yziquel <guillaume.yziquel_at_citycable.ch>
Date: Sun, 22 Nov 2009 00:31:24 +0100

Simon Urbanek a écrit :
>
> On Nov 21, 2009, at 4:12 PM, Guillaume Yziquel wrote:
>
>> Hello.
>>
>> I've been looking at vecsexps for my binding.
>>
>> Concerning strings, I'm wondering: are they supposed to be
>> null-delimited?
>
> Yes, they are null-delimited when you create/access them.

OK. Fair enough. But is guaranteed that null-delimitation ends where the   vecsxp field of the * VECSEXP tells where the R vector should end? Let me rephrase that:

-1- Should I consider it a bug if the two informations differ?

-2- What's the "safest" way out of the two?

>> Are they delimited by the info in the SEXPHEADER macro in Rinternals.h?
>
>
> You should not be touching or reading that.

I believe I should. I'd like the OCaml / R binding to be closely knit to R internals. One reason would be for speed, the other being that I'd like to make use of camlp4 to write syntax extensions to mix OCaml and R syntax. It's therefore important for me not to rely on the R interpreter to be active when building R values. Or when marshaling R values via OCaml. There are numerous other issues aside this one.

I'm already using #define USE_RINTERNALS in my .c file to inspect R values.

>> Basically, what are the macros or functions to access the values of
>> the vecsexps?
>
> VECTOR_ELT and SET_VECTOR_ELT (assuming that you're referring to VECSXP
> which is are generic vectors).

No. I'm refering to INTSXP for now. But I see what you mean:

> #define INTEGER(x) ((int *) DATAPTR(x))
> #define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i]

VECTOR_ELT is not suitable for INTSXP arrays. I need to convert to INTSXP array to an OCaml list / array.

>> I'm thinking of CHARSXPs and INTSXPs for the moment...
>
> Those are entirely different - CHARSXP are not vectors but strings (see
> mkChar et al., CHAR, ...) and INTSXP are integer arrays (in C speak)
> accessed using INTEGER.

OK. They're not vectors. They're VECTOR_SEXPRECs.

> Please read R-exts - it's better than guessing.

Funny, I have R-exts.pdf and R-ints.pdf opened. They're fine when it comes to writing R extensions. Not when writing bindings embedding R into OCaml so that you can beta-reduce isomorphically in R and OCaml.

> Cheers,
> Simon

I'm already using heretic features in OCaml (namely Obj.magic) in order to do this binding. I do not mind using heretic features of the R API.

I do not mean to be a pain, but I have to do what needs to be done. If I find on my way that #define USE_RINTERNALS is overkill, I'll gladly drop it.

For instance, here's one of my issues: I've extracted the R SEXP for the "str" symbol. It's a promise. Now, how do I map such a SEXP to an OCaml function? Haven't found that in R-ints.pdf or R-exts.pdf. There's talk about functions, but promises are somewhat overlooked. However, such a mapping is crucial to me.

I was not guessing when I was trying to look at the internal structure of R data. Simply trying to get a grip on how to execute promises, and therefore examining such a promise:

> # R.Internal.Pretty.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.Pretty.t =
> PROMISE
> {value = SYMBOL None;
> expr =
> CALL (SYMBOL (Some ("lazyLoadDBfetch", BUILTIN)),
> [INT [105; 153119]; Unknown; Unknown; Unknown]);
> env = Unknown}

Or, following structures in Rinternals.h:

> # R.Internal.C.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.C.t =
> Val
> {content =
> PROMSXP
> {prom_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> R.Internal.C.expr =
> Val
> {content =
> LANGSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname = Val {content = CHARSXP "lazyLoadDBfetch"};
> sym_value = Val {content = BUILTINSXP 687};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval = Val {content = INTSXP [105; 153119]};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname = Val {content = CHARSXP "datafile"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname =
> Val {content = CHARSXP "compressed"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value =
> R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval =
> Val
> {content =
> LISTSXP
> {carval =
> Val
> {content =
> SYMSXP
> {pname =
> Val {content = CHARSXP "envhook"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value =
> R.Internal.C.Recursive <lazy>;
> internal =
> Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};
> cdrval = Val {content = NILSXP};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> tagval = Val {content = NILSXP}}};
> R.Internal.C.env = Val {content = ENVSXP}}}
> #

For instance, an issue I'd like advice on is: what does such a symbol mean?

> SYMSXP
> {pname = Val {content = CHARSXP "datafile"};
> sym_value =
> Val
> {content =
> SYMSXP
> {pname = Val {content = NILSXP};
> sym_value = R.Internal.C.Recursive <lazy>;
> internal = Val {content = NILSXP}}};
> internal = Val {content = NILSXP}}};

And how is it treated when "str" is executed?

All the best.

-- 
      Guillaume Yziquel
http://yziquel.homelinux.org/

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 21 Nov 2009 - 23:34:54 GMT

This archive was generated by hypermail 2.2.0 : Sun 22 Nov 2009 - 01:20:38 GMT