Re: [R] A regression problem using dummy variables

From: rlearner309 <unixunix99_at_gmail.com>
Date: Thu, 03 Jul 2008 08:30:57 -0700 (PDT)

sorry, made a stupid mistake.
I got it.
thanks a lot!

Peter Dalgaard wrote:
>
> rlearner309 wrote:

>> I think it is zero, because you have lots of zeros there.  It is not like
>> continous variables.
>>
>>   

> Think again. The sum of products may be zero, but that is not the
> covariance. And don't dismiss Thomas, he is usually right.
>
> Anyways, the coefs of dummy variables represent differences to the same
> base level, and chosing a poorly determined base level (essentially:
> whose mean is determined by only a few observations) will cause high
> parameter correlation. It should only affect those parameters though,
> and it is not really clear what VIF means for dummy variables. One often
> choses to relevel() to make the largest group the base level, but it
> really comes down to which group contrasts you want to look at.
>
>
>>
>> Thomas Lumley wrote:
>>   
>>> On Wed, 2 Jul 2008, rlearner309 wrote:
>>>
>>>     
>>>> I think the covariance between dummy variables or between dummy
>>>> variables
>>>> and
>>>> intercept should always be zero.  meaning: no sigularity problem??
>>>>
>>>>       
>>> No.  You can easily check that this is not true using the cov()
>>> function.
>>> Indicator variables for mutually exclusive groups are negatively
>>> correlated.
>>>
>>>      -thomas
>>>
>>>
>>>
>>>     
>>>> rlearner309 wrote:
>>>>       
>>>>> This is actually more like a Statistics problem:
>>>>> I have a dataset with two dummy variables controlling three levels. 
>>>>> The
>>>>> problem is, one level does not have many observations compared with
>>>>> other
>>>>> two levels (a couple of data points compared with 1000+ points on
>>>>> other
>>>>> levels).  When I run the regression, the result is bad.  I have
>>>>> unbalanced
>>>>> SE and VIF.  Does this kind of problem also belong to "near
>>>>> sigularity"
>>>>> problem?  Does it make any difference if I code the level that lacks
>>>>> data
>>>>> (0,0) in stead of (0,1)?
>>>>>
>>>>> thanks a lot!
>>>>>
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>       
>>> Thomas Lumley			Assoc. Professor, Biostatistics
>>> tlumley_at_u.washington.edu	University of Washington, Seattle
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>     
>>
>>   

>
>
> --
> O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
-- 
View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18260470.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Jul 2008 - 15:46:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jul 2008 - 16:31:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive