Re: [R] Defining reference category for a cph model summary inside of a "for" loop

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Mon, 31 Mar 2008 09:43:05 -0500

Wells, Brian wrote:
> Frank,
>
> Thanks again, I didn't realize that continuous variables could be
> manipulated that way inside of the summary function.
>
> I realize that my code was kind of confusing.
>
> The variables "A"..."F" are all categorical variables. They each have
> four levels named "1st Quartile"...."4th Quartile"
>
> I tried the code below with the same result.

>> print(summary(f, eval(parse(text=paste(i,"='1st Quartile'", sep='')))))

>
> In the output, the reference category is different for each of the
> variables.
>
> Brian

Thanks for clarifying. That approach will NOT provide estimates at the quartiles. For example a hazard ratio for the "upper quartile category" to the "lower quartile category" will estimate the ratio of hazards when X>Q3 to when X<Q1 where outer quartiles are Q1 and Q3. This represents a hazard ratio of an unknown mixture of distributions and will not transport to another sample with a different mixture.

In addition you will have serious residual confounding with that approach by not adjusting for all the information in continuous predictors.

Frank

> -----Original Message-----
> From: Frank E Harrell Jr [mailto:f.harrell_at_vanderbilt.edu]
> Sent: Sunday, March 30, 2008 9:14 AM
> To: Wells, Brian
> Cc: r-help_at_r-project.org
> Subject: Re: [R] Defining reference category for a cph model summary
> inside of a "for" loop
>
> Wells, Brian wrote:

>> Dr. Harrell, 
>> Thanks for you help. 
>>
>> I tried:
>>
>>> print(summary(f,parse(text=paste(i,'="1st Quartile"', sep=''))))
>> Same result. No error, the reference category simply doesn't change. 

>
> That's good, because the default in summary is to compare the outer
> quartiles for a continuous variable. And as I said before the string
> '1st Quartile' has no special meaning for R or Design.
>
> Get what you are trying to do to work without parse (and you'll need
> eval() with parse) first. When you want total control over a setting,
> say getting a hazard ratio for the .2 to the .8 quantile, do something
> like
>
> summary(f, age=quantile(age,c(.2,.8),na.rm=TRUE))
>
> Frank
>
>> Brian 
>>
>> -----Original Message-----
>> From: Frank E Harrell Jr [mailto:f.harrell_at_vanderbilt.edu] 
>> Sent: Friday, March 28, 2008 8:34 PM
>> To: Wells, Brian
>> Cc: r-help_at_r-project.org
>> Subject: Re: [R] Defining reference category for a cph model summary
>> inside of a "for" loop
>>
>> Wells, Brian wrote:
>>> I have the following code. 
>>>
>>>  
>>>
>>>  
>>>
>>>> f <- cph(formula = Surv(TimeToDeath, Dead == "Yes")
>>> ~1,data=single.dat, x=T, y=T, surv=T)
>>>
>>>> for(i in c('A', 'B', 'C', 'D', 'E', 'F')){
>>>> f <-update(f,as.formula(paste('Surv(TimeToDeath, Dead ==
>>> "Yes")~',i,sep='')))
>>>
>>>> print(summary(f, paste(i,"=1st Quartile", sep='')))
>>>  
>>>
>>>  
>>>
>>> There is no error message generated in R, but R ignores the reference
>>> category defined with paste in the summary function for the cph

> model.
>>>  
>>>
>>> The output uses the "1st Quartile" as the reference category to
>>> calculate hazards for some of the variables defined by i, but not all
>> of
>>> them. 
>>
>> Your code is confusing.  What is to the right of ~ in a formula is a 
>> predictor variable name, not a value.  If your variables are named A,

> B,
>> C, ... you are OK.
>>
>> '1st Quartile' has no special meaning to R or Design, and you can't

> pass
>> a character string as a second argument to summary and expect it to
>> work.
>>
>> You will need parse(text=paste(...)) to create an appropriate
>> expression.
>>
>> But Design gives you inter-quartile range hazard ratios by default
>> anyway.
>>
>> Beware of getting hazard ratios that are not adjusted for other 
>> variables needed in the model.
>>
>> Frank Harrell
>>
>>>  
>>>
>>>  
>>>
>>> Any help would be greatly appreciated. 
>>>
>>>  
>>>
>>> thanks
>>>
>>>  
>>>
>>> Brian J. Wells, MD, MS
>>>
>>> Research Associate
>>>
>>> Quantitative Health Sciences
>>>
>>> Cleveland Clinic
>>>

>
>
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 31 Mar 2008 - 14:48:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 31 Mar 2008 - 15:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive