[R] Converting factors back to numbers. Trouble with SPSS import data

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Mon 20 Feb 2006 - 07:16:53 EST


I'm using Fedora Core 4, R-2.2.

The basic question is: can one recover the numerical values used in SPSS after importing data into R with read.spss from the foreign library? Here's why I ask.

My colleague sent an SPSS data set. I must replicate some results she calculated in SPSS and one problem is that the numbers used in SPSS for variable values are not easily recovered in R.

I'm comparing 2 imported datasets, "eldat" (read.spss with No convert-to-factors) and
"eldatfac" (read.spss with convert-to-factors)

If I bring in the data without conversion to factors:

library(foreign)
eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,

                        to.data.frame=T)

I can see the variable HAPPY is coded 0, 1, 2, 3. Those are the numbers that SPSS
uses as contrast values when it runs a regression with HAPPY.

In contrast, allow R to translate the variables with a few value labels into factors.

library(foreign)
eldatfac <- read.spss("18CitySCBSsorted.sav", max.value.labels=7,to.data.frame=T)

Consider the first 50 observations on the variable HAPPY

> f<- eldatfac$HAPPY[1:50]
> f

 [1] Happy          Happy          Very happy     Happy          Very happy
 [6] Very happy     Happy          Very happy     Happy          Very happy

[11] Happy Happy Not very happy Very happy Very happy
[16] Happy Happy Very happy Happy Happy
[21] Not very happy Happy Happy Very happy Happy
[26] Happy Happy Happy Happy Happy
[31] Happy Happy Happy Happy Happy
[36] Happy Very happy Very happy Happy Very happy
[41] Very happy Very happy Happy Very happy Very happy
[46] Happy Happy Happy Very happy Very happy
6 Levels: Not happy at all Not very happy Happy Very happy ... Refused > levels(f)
[1] "Not happy at all" "Not very happy" "Happy" "Very happy"
[5] "Don't know" "Refused"
I need the numerical values back in order to have a regression like SPSS. Isn't this what ?factor says one ought to do? Why are these all missing?

> as.numeric(levels(f))[f]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

> as.numeric(f)
 [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
[39] 3 4 4 4 3 4 4 3 3 3 4 4

Comparing against the "as.numeric" output from the unconverted factor, I can see the levels are just one digit different.

> g <- eldat$HAPPY[1:50]
> as.numeric(g)
 [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
[39] 2 3 3 3 2 3 3 2 2 2 3 3

I'm more worried about the kinds of variables that are coded irregularly 1, 3, 7, 11 in the SPSS scheme.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon Feb 20 07:21:54 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:36 EST