Re: [R] Discretize continous variables....

From: Daniel Malter <daniel_at_umd.edu>
Date: Sat, 19 Jul 2008 07:50:14 -0700 (PDT)

This time I agree with Rolf Turner. This sounds like homework. Whether or not, type

?ifelse

in the R-prompt.

Frank is right, it leads to a loss in information. However, I think it remains interpretable. Further, it is common practice in certain fields, and it maybe a reasonable way to check whether mostly outliers in the X drive your results (although other approaches are available for that as well). The main underlying question however should be, do you have reason to expect that the response is different by the groups you create rather than in the numbers of the continuous variable.

Regarding question 2: I thought you mean that you want to reduce the number of levels (say 4) to a smaller number of levels (say 2) for one of your independent variables (i.e. one of the Xs), not Y. This makes sense only, if there is any good conceptual reason to group these categories - not just to get significance.

Best,
Daniel

Frank E Harrell Jr wrote:

> 
> milicic.marko wrote:

>> Hi R helpers,
>>
>>
>> I'm preparing dataset to fir logistic regression model with lrm(). I
>> have various cointinous and discrete variables and I would like to:
>>
>> 1. Optimaly discretize continous variables (Optimaly means, maximizing
>> information value - IV for example)
> 
> This will result in effects in the model that cannot be interpreted and 
> will ruin the statistical inference from the lrm.  It will also hurt 
> predictive discrimination.  You seem to be allergic to continuous
> variables.
> 

>> 2. Regroup discrete variables to achieve perhaps smaller number of
>> level and better information value...
> 
> If you use the Y variable to do this the same problems will result. 
> Shrinkage is a better approach, or using marginal frequencies to combine 
> levels.  See the "pre-specification of complexity" strategy in my book 
> Regression Modeling Strategies.
> 
> Frank
> 

>>
>>
>> Please suggest if there is some package providing this or same
>> functionality for discretization...
>>
>>
>> if there is no package plese suggest how to achieve this.
>>
>>
>>
>>
>> Many thanks helpers.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> -- 
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
> 
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Discretize-continous-variables....-tp18544453p18545292.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 19 Jul 2008 - 14:58:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 19 Jul 2008 - 16:31:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive