Re: [R] Appropriate regression model for categorical variables

From: Robert A LaBudde <ral_at_lcfltd.com>
Date: Tue, 12 Jun 2007 14:08:37 -0400

At 01:45 PM 6/12/2007, Tirtha wrote:
>Dear users,
>In my psychometric test i have applied logistic regression on my data. My
>data consists of 50 predictors (22 continuous and 28 categorical) plus a
>binary response.
>
>Using glm(), stepAIC() i didn't get satisfactory result as misclassification
>rate is too high. I think categorical variables are responsible for this
>debacle. Some of them have more than 6 level (one has 10 level).
>
>Please suggest some better regression model for this situation. If possible
>you can suggest some article.

  1. Using if a factor has many levels, there is a natural order to the levels. If so, consider fitting the factor as an ordered factor.
  2. Break the factor levels into 2 or 3 groups that have some rational connection. Then fit the factor with a smaller number of levels. E.g., "race" might have levels "white", "black", "asian", "pacific", "Spanish surname", "other". Consider a change to "white", "nonwhite".

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral_at_lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 12 Jun 2007 - 18:28:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 12 Jun 2007 - 18:31:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.