# Re: [R] spss.read factor reversal

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed 27 Jul 2005 - 19:01:19 EST

> I think it is doing what is supposed to do but I never used read.spss,
> so take this with a pinch of salt.
>
> In R when you use as.integer on a factor, the one with the lowest level
> gets a value of 1 and so on. The lowest level of the factor can
> determined from levels() function.
>
> f <- factor( c("Green", "Green", "Red", "Blue"),
> levels=c("Red", "Blue", "Green") )
> levels(f)
> [1] "Red" "Blue" "Green"
>
> as.integer(f)
> [1] 3 3 1 2
>
> But the levels of a factor can be changed
>
> as.integer( factor( f, levels=c("Green", "Blue", "Red" ) ) )
> [1] 1 1 3 2

Doesn't explain why 1 2 3 in the input file comes out as Green Blue Red, does it?

> You can also try setting use.value.labels=FALSE in read.spss function
> and then creating a factor out of it.

Would be interesting to see this. I would suspect that the damage is already done at that point though.

```            rval[[nm]] <- factor(rval[[nm]], levels = vl[[v]],
labels = trim(names(vl[[v]])))

```

i.e. levels and labels should be in the correct order.

But something is odd, you'd expect the following effect:

> x <- 1:3
> factor(x,levels=3:1,labels=c("G","B","R"))
[1] R B G
Levels: G B R
> as.integer(factor(x,levels=3:1,labels=c("G","B","R")))
[1] 3 2 1

but Joel's output has the levels in the order R B G, which contradicts the

attr(,"label.table")\$COLOR

BTW, this is R 2.1.1, I hope Joel isn't wasting our time by using an older version...

-p

>
>
>
> On Tue, 2005-07-26 at 17:04 -0700, Joel Bremson wrote:
> > Hi,
> >
> > I'm having a problem with spss.read reversing my factor input.
> >
> > Here is the input copied from the spss data editor:
> >
> > color cost
> > 1 2.30
> > 2 2.40
> > 3 3.00
> > 1 2.10
> > 1 1.00
> > 1 2.00
> > 2 4.00
> > 2 3.20
> > 2 2.33
> > 3 2.44
> > 3 2.55
> >
> > For color, red=1, blue=2, and green = 3. It's type is 'String' and
> >
> > >out
> >
> > \$COLOR
> > [1] green blue red green green green blue blue blue red red
> > Levels: red blue green
> >
> > \$COST
> > [1] 2.30 2.40 3.00 2.10 1.00 2.00 4.00 3.20 2.33 2.44 2.55
> >
> > attr(,"label.table")
> > attr(,"label.table")\$COLOR
> > green blue red
> > 3 2 1
> >
> > attr(,"label.table")\$COST
> > NULL
> >
> > attr(,"variable.labels")
> > COLOR COST
> > "color" "cost"
> >
> > =====EOF===================
> >
> > Notice that the \$COLOR factor data are inverted, looking at the integer
> > output
> > we see:
> >
> > > as.integer(out\$COLOR)
> > [1] 3 2 1 3 3 3 2 2 2 1 1
> >
> > The spss original data looks like this:
> > 1 2 3 1 1 1 2 2 2 3 3
> >
> > I can easily invert the output mathematically with:
> > q = sapply(m,function(x){ x + 2*(median(unique(m))-x)})
> >
> > (m is composed of sequential integers starting at one)
> >
> > ,but it seems as though something wrong is happening with read.spss.
> >
> > Any ideas?
> >
> > Joel Bremson
> > Graduate Student
> > UC Davis
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

```--
O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
```
Received on Wed Jul 27 19:05:48 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:34:02 EST