From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>

Date: Tue, 15 Apr 2008 17:49:45 -0500

Date: Tue, 15 Apr 2008 17:49:45 -0500

jayhegde wrote:

> Dear List,

*> I have two questions about how to do predictions using lrm, specifically
**> how to predict the ordinal response for each observation *individually*.
**> I'm very new to cumulative odds models, so my apologies if my questions are
**> too basic.
**>
**> I have a dataset with 4000 observations. Each observation consists of
**> an ordinal outcome y (i.e., rating of a stimulus with four possible ratings,
**> 1 through 4), and the values of two predictor variables x1 and x2 associated
**> with each stimulus:
**>
**> ---------------------------------------
**> Obs# y x1 x2
**> ---------------------------------------
**> 1 3 2.35 -1.07
**> 2 2 1.78 -0.66
**> 3 4 5.19 -3.51
**> ...
**> 4000 1 0.63 -0.23
**> ---------------------------------------
**>
**> I get excellent fits using
**>
**> fit1 <-lrm(y ~ x1+x2, data=my.dataframe1)
**>
**> Now I want to see how well my model can predict y for a new set of 4000
**> observations. I need to predict y for each new observation *individually*.
**> I know an expression like
**>
**> predicted1<-predict(fit1, newdata=my.dataframe2, type=""fitted.ind")
**>
**> can give *probability* of each of the 4 possible responses for each
**> observation. So my questions are
**>
**> (1) How do I pick the likeliest y (i.e., likeliest of the 4 possible
**> ratings) for each given new observation?
**>
**> (2) Are there good reference that explain the theory behind this type of
**> prediction to a beginner like me?
**>
**> Thank you very much,
**> Jay HegdĂ©
**> Univeristy of Minnesota
**>
**>
**>
**>
*

You can easily pick the highest probability category after running predict(fit, newdataset, type='fitted.ind') but this will result in an improper scoring rule (i.e., an accuracy score that is optimized by the wrong model). I suggest instead computing the Somers Dxy rank correlation between predicted log odds (for any one intercept, it doesn't matter which one) and the observed ordinal category.

Frank

-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue 15 Apr 2008 - 22:52:02 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 15 Apr 2008 - 23:30:29 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*