Re: [R] coding for categorical variables with unequal observations

From: Nordlund, Dan (DSHS/RDA) <>
Date: Thu, 03 Apr 2008 14:45:25 -0700

> -----Original Message-----
> From:
> [] On Behalf Of Tanya Yatsunenko
> Sent: Thursday, April 03, 2008 1:55 PM
> To:
> Subject: [R] coding for categorical variables with unequal
> observations
> Hi,
> I am doing multiple regression, and have several X variables that are
> categorical.
> I read that I can use dummy or contrast codes for that, but are there
> any special rules when there're unequal #observations in each
> groups (4
> females vs 7 males in a "gender" variable)?
> Also, can R generate these codes for me?
> THanks.

You don't need to do anything special, and yes you can just let SAS do it for you. For most of the regression PROCs you can put your categorical variables in a CLASS statement. Depending on which procedure you are using, you may be able to specify whether you want effects or dummy coding, and which level of the categorical variable should be the "comparison" level. It is also possible to use PROC GLMMOD to create your design variables to be fed into other PROCs. Other approaches are possible as well.

If you provide more detail on what analyses you plan to undertake, someone may be able to provide more specific advice.

Hope this is helpful,


Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services Olympia, WA 98504-5204 mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 03 Apr 2008 - 21:50:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Apr 2008 - 22:30:26 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive