Re: [R] Avoiding factors and levels in data frames

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Mon, 01 Sep 2008 10:22:58 +0100 (BST)


On 01-Sep-08 08:20:25, ONKELINX, Thierry wrote:
>
> Try to add options(stringsAsFactors = FALSE) in your Rprofile.site
> (in the etc directory). Using as.is = TRUE seems safer than
> stringsAsFactors = FALSE in the read.fwf function. Because as.is
> is set to FALSE by default and stringsAsFactors is not set.
>
> HTH,
>
> Thierry

Can I ask for some elucidation about how the code operates here? Apparently read.fwf() calls read.table(), and ?read.fwf refers you to ?read.table for things like 'as.is' and 'stringsAsFactors'.

When I look at the code for read.table, I see in the paramater list:

function (file, .... , as.is = !stringsAsFactors, ... ,

          stringsAsFactors = default.stringsAsFactors(), ... )

with *no further reference whatever* to 'stringsAsFactors' in the body of the function. In particular, there is no test that I can see of whether or not 'stringsAsFactors' has been set by the user in the call.

The standard result of default.stringsAsFactors() is TRUE.

I've written a tiny test function:

  temp<-function(as.is = !stringsAsFactors,

        stringsAsFactors = default.stringsAsFactors()){   print(c(as.is=as.is, sAF=stringsAsFactors))   }

  temp()
# as.is sAF
# FALSE TRUE   temp(stringsAsFactors = FALSE)
# as.is sAF
# TRUE FALSE   temp(as.is=FALSE,stringsAsFactors = FALSE) # as.is sAF
# FALSE FALSE So, if read.table is called with 'as.is=FALSE' (which is the default set by read.fwf(), with any reference to 'stringsAsFactors' in the call being part of the "..." which is passed to read.table()), then read.table will be called with 'as.is=FALSE' regardless of whether 'stringsAsFactors=FALSE' has been set explicitly in calling read.fwf().

The only way to get 'as.is' to be TRUE would be to set it explicitly in the call to read.fwf() (and in that case one need not bother with 'stringsAsFactors', since its only purpose seems to be to determine the value of 'as.is'). Or, of course, to set default.stringsAsFactors to be FALSE; but in many case people will want to have per-case control over what happens in cases like this.

Well, that's how it seems to me, on reading the code. Is this what Thierry really means when he says "stringsAsFactors is not set"?

If that is the case, then it seems to indicate some conflict or inconsistency between read.fwf() and read.table() in this respect. In any case, it strikes me as something of an undesirable tangle!

With thanks for any comments,
Ted.

> -----Oorspronkelijk bericht-----
> Van: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> Namens Asher Meir
> Verzonden: zondag 31 augustus 2008 11:02
> Aan: r-help_at_r-project.org
> Onderwerp: [R] Avoiding factors and levels in data frames
>
> Hello all.
>
> I am an experienced R user, I have used R for many years for a wide
> variety of applications. However, I keep on running into one obstacle:
> I never want factors or levels in my data frames, but I keep on
> getting them. Is there any way to globally turn this whole feature of
> data frames off? Using options(stringAsFactors=FALSE) does not seem to
> work.
> Alternatively, if I have a data frame with levels, can I just get rid
> of them in that data frame?
>
> Here is an example: I have a large text file, of which part is in the
> fixed-width tabular form I need. I created a widths vector and a
> column names vector. I then read the file as follows:
>
> raw1<-read.fwf(fn1,widths=widmax,col.names=headermax,stringsAsFactors=FA
> LSE)
>
> But raw1 still has factors! It is an old class data frame:
>

>> is(raw1)

> [1] "data.frame" "oldClass"
>
> And it still has levels:
>> raw1[1,1]

> [1] Gustav wind
> 229 Levels: - - - - - - - - - - - WIN - - - M ... Z INDICATES
> C
>
> My question is:
> 1. Can I get rid of the levels in raw1?
> 2. Even better -- can I stop it getting read in as a data frame with
> factors?
> 3. Even better -- can I just tell R to never use factors in my data
> frames?
>
> Or any other solution that occurs to people -- maybe this is the wrong
> way to go about reading in fixed width data in this kind of file.
>
> I would appreciate any help.
>
> Asher


E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-08                                       Time: 10:22:55
------------------------------ XFMail ------------------------------

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 01 Sep 2008 - 09:28:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 01 Sep 2008 - 11:34:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive