# Re: [R] Discretize continous variables....

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Sat, 19 Jul 2008 13:03:01 -0500

```> True. Thanks for the clarification. Is your conclusion from that that the
> findings in such case should only be interpreted in the specific context
> (with the awareness that it does not apply to changing contexts) or that
> such an approach should not be taken at all?

```

Frank

```>
>
> Frank E Harrell Jr wrote:
```

>> Daniel Malter wrote:
>>> This time I agree with Rolf Turner. This sounds like homework. Whether or
>>> not, type
>>>
>>> ?ifelse
>>>
>>> in the R-prompt.
>>>
>>> Frank is right, it leads to a loss in information. However, I think it
>>> remains interpretable. Further, it is common practice in certain fields,
>>> and
>> I have to disagree. It is easy to show that odds ratios so obtained are
>> functions of the entire distribution of the predictor in question. Thus
>> they do not estimate a scientific quantity (something that can be
>> interpreted out of context). For example if age is cut at 65 and one
>> were to add to the sample several subjects aged 100, the >=65 : <65 odds
>> ratio would change even if the age effect did not.
>>
>>> it maybe a reasonable way to check whether mostly outliers in the X drive
>>> your results (although other approaches are available for that as well).
>>> The
>>> main underlying question however should be, do you have reason to expect
>>> that the response is different by the groups you create rather than in
>>> the
>>> numbers of the continuous variable.
>> Regression splines can help. Sometimes the splines are stated in terms
>> of the cube root of the predictor to avoid excess influence.
>>
>> Frank
>>
>>> Regarding question 2: I thought you mean that you want to reduce the
>>> number
>>> of levels (say 4) to a smaller number of levels (say 2) for one of your
>>> independent variables (i.e. one of the Xs), not Y. This makes sense only,
>>> if
>>> there is any good conceptual reason to group these categories - not just
>>> to
>>> get significance.
>>>
>>> Best,
>>> Daniel
>>>
>>>
>>>
>>>
>>>
>>> Frank E Harrell Jr wrote:
>>>> milicic.marko wrote:
>>>>> Hi R helpers,
>>>>>
>>>>>
>>>>> I'm preparing dataset to fir logistic regression model with lrm(). I
>>>>> have various cointinous and discrete variables and I would like to:
>>>>>
>>>>> 1. Optimaly discretize continous variables (Optimaly means, maximizing
>>>>> information value - IV for example)
>>>> This will result in effects in the model that cannot be interpreted and
>>>> will ruin the statistical inference from the lrm. It will also hurt
>>>> predictive discrimination. You seem to be allergic to continuous
>>>> variables.
>>>>
>>>>> 2. Regroup discrete variables to achieve perhaps smaller number of
>>>>> level and better information value...
>>>> If you use the Y variable to do this the same problems will result.
>>>> Shrinkage is a better approach, or using marginal frequencies to combine
>>>> levels. See the "pre-specification of complexity" strategy in my book
>>>> Regression Modeling Strategies.
>>>>
>>>> Frank
>>>>
>>>>> Please suggest if there is some package providing this or same
>>>>> functionality for discretization...
>>>>>
>>>>>
>>>>> if there is no package plese suggest how to achieve this.
>>>>>
>>>>>
>> --
>>

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 19 Jul 2008 - 18:06:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 19 Jul 2008 - 23:31:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.