Re: [Rd] read.spss issues

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Wed, 15 Feb 2012 17:04:33 -0500

On Feb 15, 2012, at 3:28 PM, Thomas Lumley wrote:

> On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms <jeroen.ooms@stat.ucla.edu
> > wrote:
>
>> The second problem is that the spss dataformat allows to specify
>> 'duplicate labels', whereas this is not allowed for factors.
>> read.spss
>> does not deal with this and creates a bad factor
>>
>> x <- read.spss("http://www.stat.ucla.edu/~jeroen/spss/duplicate_labels.sav
>> ",
>> use.value.labels=T);
>> levels(x$opinion);
>>
>> which causes issues downstream. I am not sure if this is an issue in
>> read.spss() or as.factor(), but I guess it might be wise to try to
>> detect duplicate levels and assign them all with one and the same
>> integer value when converting to a factor.
>
> I think this one would be better dealt with by giving an error.
>
> SPSS value labels are just labels, so they don't map very well onto R
> factors, which are enumerated types. Rather than force them and lose
> data, I would prefer to make the user decide what to do.

I could imagine that users might appreciate the possibility of getting the data from read.spss one pass, but also getting the labels from a separate function that made a best guess at what was needed but did not try to unambiguously match up variables with factor levels for all variables. For big datasets, there might be only a few edits needed to throw out duplicates and save a lot of typing errors.

-- 

David Winsemius, MD
West Hartford, CT

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 15 Feb 2012 - 22:06:57 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 16 Feb 2012 - 00:20:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive