Re: [R] paired t-test with bootstrap

From: Marc Schwartz <>
Date: Tue 13 Jul 2004 - 23:23:44 EST

On Tue, 2004-07-13 at 07:28, Petr Pikal wrote:
> Hi
> On 13 Jul 2004 at 12:28, luciana wrote:
> > Dear Sirs,
> >
> > I am a R beginning user: by mean of R I would like to apply the
> > bootstrap to my data in order to test cost differences between
> > independent or paired samples of people affected by a certain
> > disease.
> >
> > My problem is that even if I am reading the book by Efron
> > (introduction to the bootstrap), looking at the examples in internet
> > and available in R, learning a lot of theoretical things on
> > bootstrap, I can't apply bootstrap with R to my data because of many
> > doubts and difficulties. This is the reason why I have decided to
> > ask the expert for help.
> >
> >
> >
> > I have a sample of diabetic people, matched (by age and sex) with a
> > control sample. The variable I would like to compare is their drug
> > and hospital monthly cost. The variable cost has a very far from
> > gaussian distribution, but I need any way to compare the mean
> > between the two group. So, in the specific case of a paired sample
> > t-test, I aim at testing if the difference of cost is close to 0.
> > What is the better way to follow for that?
> >
> >
> >
> > Another question is that sometimes I have missing data in my dataset
> > (for example I have the cost for a patients but not for a control).
> > If I introduce NA or a dot, R doesn't estimate the statistic I need
> > (for instance the mean). To overcome this problem I have replaced
> > the missing data with the mean computed with the remaining part of
> > data. Anyway, I think R can actually compute the mean even with the
> > presence of missing data. Is it right? What can I do?
> your.statistic(, na.rm=T)
> e.g.
> mean(, na.rm=T)
> or look at ?na.action e.g mean(na.omit(
> Cheers
> Petr Pikal

A couple of other thoughts here with respect to the use of a paired t-test for the comparison.

As Luciana notes above, cost data is typically highly skewed, raising doubt as to the use of a simple parametric test to compare the two groups.

One of the many reasons such data is skewed is that there are notable differences in the populations that are not accounted for when using simple characteristics for matching as is done here. What makes a patient an "outlier" with respect to cost and how does the distribution of these patients differ between the two groups and the individual pairs?

For example, are all the patients in both groups insulin dependent or are some controlled with oral agents or diet alone? If all are using insulin, are some using self-administered injections while others are using implanted infusion pumps? What is the interval from disease onset? Have any had Pancreas/Islet Cell transplants? Do the matched patients have similar diabetic related sequelae such as diabetic retinopathy, neuropathy, vasculopathy, renal dysfunction and others? If not, the costs to treat these other issues, such as dialysis and wound care alone, can dramatically alter the cost profile for patients even when matched by age and gender.

If you are not considering these issues (ie. such as inclusion/exclusion criteria), you risk significant challenges in your conclusions with respect to the comparison of costs for these two groups. I would raise similar concerns when using a sample mean as the imputed value for missing data.

If you have not done so already, a Medline search of the literature would be in order to better understand what others have done in this area for diabetic treatment costs and the pros and cons of their respective approaches. I suspect that others here will have additional recommendations.

HTH, Marc Schwartz mailing list PLEASE do read the posting guide! Received on Tue Jul 13 23:32:03 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:54:55 EST