Re: [Rd] read.spss issues

From: Thomas Lumley <>
Date: Thu, 16 Feb 2012 09:28:32 +1300

On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms <> wrote:

> The second problem is that the spss dataformat allows to specify
> 'duplicate labels', whereas this is not allowed for factors. read.spss
> does not deal with this and creates a bad factor
> x <- read.spss("",
> use.value.labels=T);
> levels(x$opinion);
> which causes issues downstream. I am not sure if this is an issue in
> read.spss() or as.factor(), but I guess it might be wise to try to
> detect duplicate levels and assign them all with one and the same
> integer value when converting to a factor.

I think this one would be better dealt with by giving an error.

SPSS value labels are just labels, so they don't map very well onto R factors, which are enumerated types. Rather than force them and lose data, I would prefer to make the user decide what to do.


Thomas Lumley
Professor of Biostatistics
University of Auckland

______________________________________________ mailing list
Received on Wed 15 Feb 2012 - 20:30:46 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 15 Feb 2012 - 23:40:17 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive