Re: [R] predictive accuracy

From: Mike Marchywka <>
Date: Thu, 26 May 2011 20:55:28 -0400

> Date: Thu, 26 May 2011 13:50:15 -0700
> From:
> To:
> CC:
> Subject: Re: [R] predictive accuracy
> 1. This is not about R, and should be taken off list.

Well, depending on what mod's think a little bit of generic "how do I REALLY use this tool" discusssion may be of benefit for all here- a maillist for a certain brand of hammer may discuss various uses and types of nails etc.

Pesonally I have an interest in this-if the OP will post the data it may be possible to explore some analysis options.

> 2. You are wading in an alligator infested swamp. Get help from
> (other) statisticians at Pfizer (there are many good ones there).

I thought that is what statisticians do? LOL. We don't know the situation- intern, looking for outside ideas after exhausting internals, specific issues with internal peers, summer student not wishing to bother everyone there for details etc.

> Best,
> Bert
> P.S. The answer to all your questions is "no" (imho).

> On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed
> wrote:
> > The strong predictor is the country/region where the study was
> > conducted. So it is not important/useful for a clinician to use it (as
> > long he/she is in USA or Europe).
> > Excluding that predictor will make another 2 insignificant predictors to
> > become significant!! Can the new model have a reliable predictive
> > accuracy? I thought of excluding all patients from other countries and
> > develop the model accordingly- is the exclusion of a lot of patients and
> > compromise of the power is more acceptable??

LOL, quite the contrary, post hoc selection increases power to find whatever you or sponsor desire...

Presuming your general interest is in finding out attributes of a given drug under various conditions, you would probably want to combine the observations with tentative thoughts on causality and see what makes the best story.

Statistical significance in isolation is a function of the data and analysis method, doesn't really have anything specific to do with underlying systems.

In this case, if you have other continuous prognostic factors, say age, LDH, hemoglobin come to mind, you may be able to find that you have nonmonotinc  relations between prognostic factor and outcome. But, furhter,say you have enough patients that you could in fact map dose response curves. It may turn out that this curve is in fact non-montonic with parameters non-monotonic in prognsotic factor. Consider

avg_survival= a+b*d-c*d^2

where d is the dose. At for small d, it seems to help but for larger dose it makes things worse. Now consider that "c" is a complicated function of hematocrit, it may not be hard to imagine that anemics and siderositic( is that a word LOL?) have some underlying problems dealing with your drug. These may be distributed geographically etc etc etc.

This is all stuff you can simulate in R or even on paper.

It sounds like you are already trying to write a label, which may be a bit premature ( although I defer to the guy from DNA for that LOL). " indicated for use in patients in Western Hemisphere with .... "

You may have decent luck looking at FDA panel discussion transcripts, search for related general stats terms confined to ""

> > Thanks for your help...
> > Al
> >
> > -----Original Message-----
> > From: Marc Schwartz []
> > Sent: Thursday, May 26, 2011 10:54 AM
> > To: El-Tahtawy, Ahmed
> > Cc:
> > Subject: Re: [R] predictive accuracy
> >
> >
> > On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote:
> >
> >> I am trying to develop a prognostic model using logistic regression.
> > I
> >> built a full , approximate models with the use of penalization -
> > design
> >> package. Also, I tried Chi-square criteria, step-down techniques. Used
> >> BS for model validation.
> >>
> >> > The main purpose is to develop a predictive model for future patient
> >> population. One of the strong predictor pertains to the study design
> >> and would not mean much for a clinician/investigator in real clinical
> >> situation and have been asked to remove it.
> >> > Can I propose a model and nomogram without that strong -irrelevant
> >> predictor?? If yes, do I need to redo model calibration,
> > discrimination,
> >> validation, etc...?? or just have 5 predictors instead of 6 in the
> >> prognostic model??
> >>
> >>
> >>
> >> Thanks for your help
> >>
> >> Al
> >
> >
> > Is it that the study design characteristic would not make sense to a
> > clinician but is relevant to future samples, or that the study design
> > characteristic is unique to the sample upon which the model was
> > developed and is not relevant to future samples because they will not be
> > in the same or a similar study?
> >
> > Is the study design characteristic a surrogate for other factors that
> > would be relevant to future samples? If so, you might engage in a
> > conversation with the clinicians to gain some insights into other
> > variables to consider for inclusion in the model, that might in turn,
> > help to explain the effect of the study design variable.
> >
> > Either way, if the covariate is removed, you of course need to engage in
> > fully re-evaluating the model. You cannot just drop the covariate and
> > continue to use model fit assessments made on the full model.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > ______________________________________________
> > mailing list
> >
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
> >
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
> -- Maimonides (1135-1204)
> Bert Gunter
> Genentech Nonclinical Biostatistics
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
                                          mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Fri 27 May 2011 - 00:57:41 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 31 May 2011 - 15:00:11 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive