[R] How can I import user-defined missings from Spss?

From: Christine Christmann <christinechristmann_at_web.de>
Date: Tue, 15 Apr 2008 11:51:03 +0200


Hi,

It works for me to import spss datasets via library(foreign) with read.spss or via library Hmisc by (spss.get). But no matter which way I do import the data, user-defined missings from Spss are always lost. (it makes no difference if there are a single value, a range, or any combination of them. They are always ignored). Is there any way in R to find out if any value was user-defined missing in Spss or not? Even to keep the information as an attribute would suit me fine, or to keep them as a string character like "miss" would be even better. To transform them into "NA" as the sysmis data from Spss is transformed automatically, would be an other alternative.

Unfortunately I don't know if any of these options are possible. Could you help me out?

Let me give you an example:
Preconditions: You need to have spss on you computer to generate the spss data. You need to generate the folder C:/tmp to save the spss file. As you can see I work with windows.

*/1) Generate the SpssData:
*/data.
DATA LIST LIST /age (f2) sport (f2).
BEGIN DATA

22, 1 
40, 2
69, 1
19, 2

-99, 9
END DATA. */description.
missing values age (LO thru 0).
missing values sport (9).
var label age "age".
var label sport "Do you like sports"
value label sport
1 "yes"
2 "no"
3 "don't know".

*frequencies in Spss.
freq age sport.

save outfile = "C:\tmp\test.sav".

*-----------------------------------------------------------------------------------------.


2) Import the Spss Data in R. Via Hmisc or foreign - both work fine.

#import Spssdata in R

spssfile <- "C:/tmp/test.sav"

#via Hmisc

library(Hmisc)
Signs <- c("_")
mydata1 <- spss.get(spssfile,lowernames=TRUE, allow=Signs)

#via foreign

library(foreign)
mydata2 <- read.spss(spssfile,use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)

#freq in r

describe(mydata1)
describe(mydata2)

*-----------------------------------------------------------------------------------------.
Have a look at the two variables age and sport. In spss the values (-99) in age is a missing, as well as the value (9) in sports. As you can see - the information about the missings in R is lost. What can I do?

Many Thanks Christine Christmann



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Apr 2008 - 09:57:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 Apr 2008 - 11:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive