[Rd] importing explicitly declared missing values in read.spss (foreign)

From: Jeroen Ooms <j.c.l.ooms_at_uu.nl>
Date: Fri, 01 Aug 2008 08:17:52 -0700 (PDT)

There is a problem when importing an spss-file containing explicitly declared missing values in R using the read.spss function from the foreign package. I'm not sure these problems are the same in every version of spss, I am using the latest version 16.0.2.

I included http://www.nabble.com/file/p18776776/missingdata.sav missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg frequencies.jpg as an example. The data contains 3 types of missing data: 2 are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the third type are the system missings. When this file is imported in R, only the system missings are recognized as missing values, the others are just imported as levels in the nominal case, and as (labeled) real values 8 and 9 in the continuous case. There are also no attributes in the object returned by read.spss that contain information about which values/levels are the missing values; their missingness seems to be completely ignored by the function.

Is there some way or other function to be able to import spss files, with an option that replaces all missing values with <NA>'s in R? Of course this comes with the trade-off of losing the meaning of the missingness when there are multiple types of missingness, but I think this is far less harmfull than treating all missing values as normal values.

[code]
> mydata <- read.spss("c:/users/jeroen/desktop/missingdata.sav",
> to.data.frame=T)

Warning messages:
1: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) :
  c:/users/jeroen/desktop/missingdata.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) :
  c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 16 encountered in system file
3: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame = T) :
  c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 20 encountered in system file

> mydata

   SUBJECT CATEGORI CONTINUO

1        1      yes     3.11
2        2      yes     2.10
3        3      yes     5.34
4        4      yes     1.54
5        5      yes     3.89
6        6       no     2.98
7        7       no     4.53
8        8       no     1.98
9        9       no     3.68
10      10       no     2.94
11      11       NA     8.00
12      12       NA     8.00
13      13       NA     8.00
14      14       NA     8.00
15      15       NA     8.00
16      16      NAP     9.00
17      17      NAP     9.00
18      18      NAP     9.00
19      19      NAP     9.00
20      20      NAP     9.00
21      21     <NA>       NA
22      22     <NA>       NA
23      23     <NA>       NA
24      24     <NA>       NA
25      25     <NA>       NA

> is.na(mydata$CONTINUO)

 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> is.na(mydata$CATEGORI)

 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE

> summary(mydata)

    SUBJECT CATEGORI CONTINUO

 Min.   : 1   yes :5   Min.   :1.540  
 1st Qu.: 7   no  :5   1st Qu.:3.078  
 Median :13   NA  :5   Median :6.670  
 Mean   :13   NAP :5   Mean   :5.854  
 3rd Qu.:19   NA's:5   3rd Qu.:8.250  
 Max.   :25            Max.   :9.000  
                       NA's   :5.000  

[/code]
-- 
View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html
Sent from the R devel mailing list archive at Nabble.com.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 01 Aug 2008 - 15:20:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Aug 2008 - 10:35:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive