*>>> This time I agree with Rolf Turner. This sounds like homework. Whether or
*>>> not, type
*>>>
*>>> ?ifelse
*>>>
*>>> in the R-prompt.
*>>>
*>>> Frank is right, it leads to a loss in information. However, I think it
*>>> remains interpretable. Further, it is common practice in certain fields,
*>>> and
*>> I have to disagree. It is easy to show that odds ratios so obtained are
*>> functions of the entire distribution of the predictor in question. Thus
*>> they do not estimate a scientific quantity (something that can be
*>> interpreted out of context). For example if age is cut at 65 and one
*>> were to add to the sample several subjects aged 100, the >=65 : <65 odds
*>> ratio would change even if the age effect did not.
*>>
*>>> it maybe a reasonable way to check whether mostly outliers in the X drive
*>>> your results (although other approaches are available for that as well).
*>>> The
*>>> main underlying question however should be, do you have reason to expect
*>>> that the response is different by the groups you create rather than in
*>>> the
*>>> numbers of the continuous variable.
*>> Regression splines can help. Sometimes the splines are stated in terms
*>> of the cube root of the predictor to avoid excess influence.
*>>
*>> Frank
*>>
*>>> Regarding question 2: I thought you mean that you want to reduce the
*>>> number
*>>> of levels (say 4) to a smaller number of levels (say 2) for one of your
*>>> independent variables (i.e. one of the Xs), not Y. This makes sense only,
*>>> if
*>>> there is any good conceptual reason to group these categories - not just
*>>> to
*>>> get significance.
*>>>
*>>> Best,
*>>> Daniel
*>>>
*>>>
*>>>
*>>>
*>>>
*>>> Frank E Harrell Jr wrote:
*>>>> milicic.marko wrote:
*>>>>> Hi R helpers,
*>>>>>
*>>>>>
*>>>>> I'm preparing dataset to fir logistic regression model with lrm(). I
*>>>>> have various cointinous and discrete variables and I would like to:
*>>>>>
*>>>>> 1. Optimaly discretize continous variables (Optimaly means, maximizing
*>>>>> information value - IV for example)
*>>>> This will result in effects in the model that cannot be interpreted and
*>>>> will ruin the statistical inference from the lrm. It will also hurt
*>>>> predictive discrimination. You seem to be allergic to continuous
*>>>> variables.
*>>>>
*>>>>> 2. Regroup discrete variables to achieve perhaps smaller number of
*>>>>> level and better information value...
*>>>> If you use the Y variable to do this the same problems will result.
*>>>> Shrinkage is a better approach, or using marginal frequencies to combine
*>>>> levels. See the "pre-specification of complexity" strategy in my book
*>>>> Regression Modeling Strategies.
*>>>>
*>>>> Frank
*>>>>
*>>>>> Please suggest if there is some package providing this or same
*>>>>> functionality for discretization...
*>>>>>
*>>>>>
*>>>>> if there is no package plese suggest how to achieve this.
*>>>>>
*>>>>>
*>> --
*>>
> True. Thanks for the clarification. Is your conclusion from that that the > findings in such case should only be interpreted in the specific context > (with the awareness that it does not apply to changing contexts) or that > such an approach should not be taken at all?

The latter, in general; in specific cases the former. But even then why condition on incomplete information when complete information is available? I.e., why compute Pr(Y=1 | X>x) in place of Pr(Y=1 | X=x)?

Frank

> > > Frank E Harrell Jr wrote:

>> Daniel Malter wrote:

