From: Inman, Brant A. M.D. <Inman.Brant_at_mayo.edu>

Date: Sun 07 Jan 2007 - 16:27:34 GMT

Date: Sun 07 Jan 2007 - 16:27:34 GMT

Thank you for the help. Indeed, the differences in the results that I
noted were due to the incorrect ordering of the response variable that
resulted from my unattentive use of as.ordered on a character vector.

Brant

-----Original Message-----

From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk]
Sent: Sunday, January 07, 2007 2:52 AM

To: Inman, Brant A. M.D.

Cc: r-help@stat.math.ethz.ch; yee@stat.auckland.ac.nz
Subject: Re: [R] Using VGAM's vglm function for ordinal logistic
regression

On Sat, 6 Jan 2007, Inman, Brant A. M.D. wrote:

*>
*

> R-Experts:

*>
**> I am using the vglm function of the VGAM library to perform
*

proportional

> odds ordinal logistic regression. The issue that I would like help

with

> concerns the format in which the response variable must be provided

for

> this function to work correctly.

*> pneumo2$severity
*

[1] normal normal normal normal normal normal normal normal mild
mild

[11] mild mild mild mild mild mild severe severe severe
severe

[21] severe severe severe severe

Levels: mild < normal < severe

is different from the ordering in the first example.

The difference between vglm and polr is that the latter uses the conventional sign for the coefficients: see MASS4 p.204.

I would never use as.ordered on a character vector, as this leaves it to
R

to choose the ordering. (Even if you think you intended alphabetical
order, that depends on the locale: see the warnings on the help page.)

> Consider the following example:

*>
**> ------
**>
**> library(VGAM)
**> library(MASS)
**>
**> attach(pneumo)
**> pneumo # Inspect the format of the original dataset
**>
**> # Restructure the pneumo dataset into a different format
**> pneumo2 <- data.frame(matrix(ncol=3, nrow=24))
**> colnames(pneumo2) <- c('exposure.time', 'severity', 'freq')
**> pneumo2[,1] <- rep(pneumo[,1],3)
**> pneumo2[,2] <-
**> as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
**> pneumo2[1:8,3] <- pneumo[,2]
**> pneumo2[9:16,3] <- pneumo[,3]
**> pneumo2[17:24,3] <- pneumo[,4]
**> pneumo2 # Inspect the format of the new modified dataset
**>
**> ------
**>
**> The problem occurs when I try to analyze these two datasets, which are
**> identical in content, with the proportional odds model using vglm:
**>
**> ------
**>
**> # Analyze the original dataset with vglm, get one set of results
**>
**>> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
**> data=pneumo,
**> + family=cumulative(parallel=T))
**>
**> Coefficients:
**> (Intercept):1 (Intercept):2 log(exposure.time)
**> 9.676093 10.581725 -2.596807
**>
**> Degrees of Freedom: 16 Total; 13 Residual
**> Residual Deviance: 5.026826
**> Log-likelihood: -204.2742
**>
**> # Analyzing the modified dataset with vglm gives another set of
*

results

*>
*

>> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,

*> + family=cumulative(parallel=T))
**>
**> Coefficients:
**> (Intercept):1 (Intercept):2 log(exposure.time)
**> -1.6306499 2.5630170 -0.1937956
**>
**> Degrees of Freedom: 48 Total; 45 Residual
**> Residual Deviance: 503.7251
**> Log-likelihood: -251.8626
**> Warning messages:
**> 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
**> trace, wzeps = control$wzepsilon)
**> 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
**> trace, wzeps = control$wzepsilon)
**> 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
**> trace, wzeps = control$wzepsilon)
**> 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
**> trace, wzeps = control$wzepsilon)
**> 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
**> trace, wzeps = control$wzepsilon)
**>
**> # Analyzing the modified dataset with polr reproduces these second
**> results
**>
**>> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)
**>
**> Coefficients:
**> log(exposure.time)
**> 0.1938154
**>
**> Intercepts:
**> mild|normal normal|severe
**> -1.630612 2.563055
**>
**> Residual Deviance: 503.7251
**> AIC: 509.7251
**>
**> ------
**>
**> Can someone explain what I am doing wrong when using vglm and polr
*

with

> the modified dataset? I do not understand why these two formulations

*> should give different results.
**>
**> Brant Inman
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
*

http://www.R-project.org/posting-guide.html

*> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Mon Jan 08 12:37:06 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Mon 08 Jan 2007 - 03:30:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*