From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>

Date: Sat, 19 Jul 2008 13:03:01 -0500

>> Daniel Malter wrote:

*>>> This time I agree with Rolf Turner. This sounds like homework. Whether or
*

*>>> not, type
*

*>>>
*

*>>> ?ifelse
*

*>>>
*

*>>> in the R-prompt.
*

*>>>
*

*>>> Frank is right, it leads to a loss in information. However, I think it
*

*>>> remains interpretable. Further, it is common practice in certain fields,
*

*>>> and
*

*>> I have to disagree. It is easy to show that odds ratios so obtained are
*

*>> functions of the entire distribution of the predictor in question. Thus
*

*>> they do not estimate a scientific quantity (something that can be
*

*>> interpreted out of context). For example if age is cut at 65 and one
*

*>> were to add to the sample several subjects aged 100, the >=65 : <65 odds
*

*>> ratio would change even if the age effect did not.
*

*>>
*

*>>> it maybe a reasonable way to check whether mostly outliers in the X drive
*

*>>> your results (although other approaches are available for that as well).
*

*>>> The
*

*>>> main underlying question however should be, do you have reason to expect
*

*>>> that the response is different by the groups you create rather than in
*

*>>> the
*

*>>> numbers of the continuous variable.
*

*>> Regression splines can help. Sometimes the splines are stated in terms
*

*>> of the cube root of the predictor to avoid excess influence.
*

*>>
*

*>> Frank
*

*>>
*

*>>> Regarding question 2: I thought you mean that you want to reduce the
*

*>>> number
*

*>>> of levels (say 4) to a smaller number of levels (say 2) for one of your
*

*>>> independent variables (i.e. one of the Xs), not Y. This makes sense only,
*

*>>> if
*

*>>> there is any good conceptual reason to group these categories - not just
*

*>>> to
*

*>>> get significance.
*

*>>>
*

*>>> Best,
*

*>>> Daniel
*

*>>>
*

*>>>
*

*>>>
*

*>>>
*

*>>>
*

*>>> Frank E Harrell Jr wrote:
*

*>>>> milicic.marko wrote:
*

*>>>>> Hi R helpers,
*

*>>>>>
*

*>>>>>
*

*>>>>> I'm preparing dataset to fir logistic regression model with lrm(). I
*

*>>>>> have various cointinous and discrete variables and I would like to:
*

*>>>>>
*

*>>>>> 1. Optimaly discretize continous variables (Optimaly means, maximizing
*

*>>>>> information value - IV for example)
*

*>>>> This will result in effects in the model that cannot be interpreted and
*

*>>>> will ruin the statistical inference from the lrm. It will also hurt
*

*>>>> predictive discrimination. You seem to be allergic to continuous
*

*>>>> variables.
*

*>>>>
*

*>>>>> 2. Regroup discrete variables to achieve perhaps smaller number of
*

*>>>>> level and better information value...
*

*>>>> If you use the Y variable to do this the same problems will result.
*

*>>>> Shrinkage is a better approach, or using marginal frequencies to combine
*

*>>>> levels. See the "pre-specification of complexity" strategy in my book
*

*>>>> Regression Modeling Strategies.
*

*>>>>
*

*>>>> Frank
*

*>>>>
*

*>>>>> Please suggest if there is some package providing this or same
*

*>>>>> functionality for discretization...
*

*>>>>>
*

*>>>>>
*

*>>>>> if there is no package plese suggest how to achieve this.
*

*>>>>>
*

*>>>>>
*

*>> --
*

*>>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 19 Jul 2008 - 18:06:35 GMT

Date: Sat, 19 Jul 2008 13:03:01 -0500

> True. Thanks for the clarification. Is your conclusion from that that the > findings in such case should only be interpreted in the specific context > (with the awareness that it does not apply to changing contexts) or that > such an approach should not be taken at all?

The latter, in general; in specific cases the former. But even then why condition on incomplete information when complete information is available? I.e., why compute Pr(Y=1 | X>x) in place of Pr(Y=1 | X=x)?

Frank

> > > Frank E Harrell Jr wrote:

>> Daniel Malter wrote:

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 19 Jul 2008 - 18:06:35 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 19 Jul 2008 - 23:31:44 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*