[R] Contrasts for ordered factors

From: <lorenz.gygax_at_art.admin.ch>
Date: Mon 08 Jan 2007 - 09:13:39 GMT


Dear all,

I do not seem to grasp how contrasts are set for ordered factors. Perhaps someone can elighten me?

When I work with ordered factors, I would often like to be able to reduce the used polynomial to a simpler one (where possible). Thus, I would like to explicetly code the polynomial but ideally, the intial model (thus, the full polynomial) would be identical to one with an ordered factor.

Here is a toy example with an explanatory variable (EV) with three distinct values (1 to 3) and a continuous response variable (RV):

options (contrasts= c ('contr.treatment', 'contr.poly')) example.df <- data.frame (EV= rep (1:3, 5)) set.seed (298)
example.df$RV <- 2 * example.df$EV + rnorm (15)

I evaluate this data using either an ordered factor or a polynomial with a linear and a quadratic term:

lm.ord <- lm (RV ~ ordered (EV), example.df) lm.pol <- lm (RV ~ EV + I(EV^2), example.df)

I then see that the estimated coefficients differ (and in other examples that I have come across, it is often even more extreme):

coef (lm.ord)
(Intercept) ordered(EV).L ordered(EV).Q   3.9497767 2.9740535 -0.1580798 coef (lm.pol)

(Intercept)            EV       I(EV^2) 
 -0.9015283     2.8774032    -0.1936074 

but the predictions are the same (except for some rounding):

table (round (predict (lm.ord), 6) == round (predict (lm.pol), 6)) TRUE
  15

I thus conclude that the two models are the same and are just using a different parametrisation. I can easily interprete the parameters of the explicit polynomial but I started to wonder about the parametrisation of the ordered factor. In search of an answer, I did check the contrasts:

contr.poly (levels (ordered (example.df$EV)))

                .L         .Q

[1,] -7.071068e-01 0.4082483
[2,] -9.073264e-17 -0.8164966
[3,] 7.071068e-01 0.4082483

The linear part basically seems to be -0.707, 0 (apart for numerical rounding) and 0.707. I can understand that any even-spaced parametrisation is possible for the linear part. But does someone know where the value of 0.707 comes from (it seems to be the square-root of 0.5, but why?) and why the middle term is not exactly 0?

I do not understand the quadratic part at all. Wouldn't that need the be the linear part to the power of 2?

Thank you for your thoughts! Lorenz
-
Lorenz Gygax
Swiss Federal Veterinary Office
Centre for proper housing of ruminants and pigs Tšnikon, CH-8356 Ettenhausen / Switzerland



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon Jan 08 22:40:12 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 10 Jan 2007 - 00:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.