Re: [R] Unexpected behaviour as.data.frame

From: Bert Gunter <gunter.berton_at_gene.com>
Date: Sun, 15 May 2011 13:17:31 -0700

Inline below.

On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan <rhelp_at_eoos.dds.nl> wrote:
> Thanks. I also noticed myself minutes after sending my message to the list.
> My 'please ignore my question it was just a stupid typo' message was sent
> with the wrong account and is now awaiting moderation.
>
> However, my other question still stands: what is the
> preferred/fastest/simplest way to create a data.fame with given column types
> and dimensions?

I do not know, but why is simply

data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE)

not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like:

> z <- matrix(numeric(5000),nr=10)
> u <- matrix(character(1000),nr=10)
> frm <- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns

While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see.

However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight.

Cheers,
Bert

>
> Regards,
> Jan
>
>
> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>
>> In your post, you're missing the final "s" on the stringsAsFactors
>> argument in the d1 assignment. When I typed it correctly, it works as
>> expected.
>>
>> -- Bert
>>
>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp_at_eoos.dds.nl>
>>  wrote:
>>>
>>> I use the following code to create two data.frames d1 and d2 from a list:
>>> types<- c("integer", "character", "double")
>>> nlines<- 10
>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>> stringsAsFactor=FALSE)
>>> l2<- lapply(types, do.call, list(nlines))
>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>
>>> I would expect d1 and d2 to be the same, however, in d1 the second column
>>> is
>>> a factor while in d2 it is a character (which I would expect):
>>>
>>>> str(d1)
>>>
>>> 'data.frame':   10 obs. of  3 variables:
>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>  $ c........................................: Factor w/ 1 level "": 1 1 1
>>> 1
>>> 1 1 1 1 1 1
>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>
>>>> str(d2)
>>>
>>> 'data.frame':   10 obs. of  3 variables:
>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>  $ c........................................: chr  "" "" "" "" ...
>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>
>>>
>>> As different but related question: I use the commands above to create an
>>> 'empty' data.frame with specified column types and dimensions. I need
>>> this
>>> data.frame to pass on to my c++ routines. Is there a more simple/elegant
>>> way
>>> of creating this data.frame?
>>>
>>> Regards,
>>>
>>> Jan
>>>
>>>
>>> PS:
>>> I am running R on 64 bit Ubuntu 11.04:
>>>
>>>> sessionInfo()
>>>
>>> R version 2.12.1 (2010-12-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
>

-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 15 May 2011 - 20:19:38 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 16 May 2011 - 08:40:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive