Re: [R] A regression problem using dummy variables

From: Ron Michael <ron_michael70_at_yahoo.com>
Date: Thu, 03 Jul 2008 02:19:21 -0700 (PDT)


"which group contrasts you want to look at" can you clarify me on that statement?

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk> Subject: Re: [R] A regression problem using dummy variables To: "rlearner309" <unixunix99_at_gmail.com> Cc: r-help_at_r-project.org
Date: Thursday, 3 July, 2008, 2:57 PM

rlearner309 wrote:
> I think it is zero, because you have lots of zeros there. It is not like
> continous variables.
>
>
Think again. The sum of products may be zero, but that is not the covariance. And don't dismiss Thomas, he is usually right.

Anyways, the coefs of dummy variables represent differences to the same base level, and chosing a poorly determined base level (essentially: whose mean is determined by only a few observations) will cause high parameter correlation. It should only affect those parameters though, and it is not really clear what VIF means for dummy variables. One often choses to relevel() to make the largest group the base level, but it really comes down to which group contrasts you want to look at.

>
> Thomas Lumley wrote:
>
>> On Wed, 2 Jul 2008, rlearner309 wrote:
>>
>>
>>> I think the covariance between dummy variables or between dummy
variables
>>> and
>>> intercept should always be zero. meaning: no sigularity problem??
>>>
>>>
>> No. You can easily check that this is not true using the cov()
function.
>> Indicator variables for mutually exclusive groups are negatively
>> correlated.
>>
>> -thomas
>>
>>
>>
>>
>>> rlearner309 wrote:
>>>
>>>> This is actually more like a Statistics problem:
>>>> I have a dataset with two dummy variables controlling three
levels. The
>>>> problem is, one level does not have many observations compared
with
>>>> other
>>>> two levels (a couple of data points compared with 1000+ points
on other
>>>> levels). When I run the regression, the result is bad. I
have
>>>> unbalanced
>>>> SE and VIF. Does this kind of problem also belong to
"near sigularity"
>>>> problem? Does it make any difference if I code the level that
lacks
>>>> data
>>>> (0,0) in stead of (0,1)?
>>>>
>>>> thanks a lot!
>>>>
>>>>
>>> --
>>> View this message in context:
>>>
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> Thomas Lumley Assoc. Professor, Biostatistics
>> tlumley_at_u.washington.edu University of Washington, Seattle
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
>

-- 
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Send instant messages to your online friends http://uk.messenger.yahoo.com
	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Thu 03 Jul 2008 - 09:47:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jul 2008 - 15:01:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive