Re: [Rd] Don't dput() data frames?

From: Simon Urbanek <simon.urbanek_at_r-project.org>
Date: Tue, 28 Aug 2012 14:27:53 -0400

On Aug 28, 2012, at 2:14 PM, "R. Michael Weylandt" <michael.weylandt_at_gmail.com> wrote:

> On Tue, Aug 28, 2012 at 1:00 PM, Simon Urbanek
> <simon.urbanek@r-project.org> wrote:

>> 
>> On Aug 28, 2012, at 1:51 PM, R. Michael Weylandt wrote:
>> 
>>> /src/main/attrib.c contains this comment in row_names_gets():
>>> 
>>> /* This should not happen, but if a careless user dput()s a
>>>          data frame and sources the result, it will */
>>> 
>>> which svn blame says Prof Ripley placed there in r39830 with the
>>> commit message "correct the work of dput() on the row names of a data
>>> frame with compact representation."
>>> 
>>> Is there a problem / better way to use the result of a hefty dput than
>>> source()ing it?
>> 
>> It's pretty much the least efficient and most dangerous (as in insecure) way. That's why there is serialization instead ...
>> 

>
> My most common use of dput() is for sending plain text data over
> r-help; would this be an official/unofficial advisement to push folks
> to use
>
> serialize(x, NULL, ascii = TRUE)
>
> instead? At first blush that seems to be less space efficient:
>
> sum(nchar(capture.output(dput(iris)))) # 3767
>
> sum(nchar(serialize(iris, NULL, ascii = TRUE))) # 5922: probably even
> more if we dump it properly to plain text in a copy+pasteable form
>

No, if you want small, readable snippets you can certainly use dput(), but when you say data frame I don't imagine anything that can be sent by e-mail :). Obviously, for toy examples you don't care about performance ...

As for size efficiency:
> save(iris,file="iris.RData")
> file.info("iris.RData")$size

[1] 1100
so in base64 that would be about 1.5k - much less than any of the above.

Cheers,
Simon

> Michael
>

>> Cheers,
>> Simon
>> 
>> 
>> 
>>> This seems to work rather robustly:
>>> 
>>> data(iris)
>>> source(textConnection(paste0("iris2 <- ", capture.output(dput(iris)))))
>>> identical(iris, iris2)
>>> 
>>> Cheers,
>>> Michael
>>> 
>>> ______________________________________________
>>> R-devel_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>> 

>
>


R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 28 Aug 2012 - 18:34:15 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 29 Aug 2012 - 08:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive