Re: [R] Cox model

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Wed, 13 Feb 2008 09:30:42 -0500

On 2/13/2008 9:08 AM, Gustaf Rydevik wrote:

> On Feb 13, 2008 3:06 PM, Gustaf Rydevik <gustaf.rydevik_at_gmail.com> wrote:

>> On Feb 13, 2008 2:37 PM, Matthias Gondan <matthias-gondan@gmx.de> wrote:
>> > Hi Eleni,
>> >
>> > The problem of this approach is easily explained: Under the Null
>> > hypothesis, the P values
>> > of a significance test are random variables, uniformly distributed in
>> > the interval [0, 1]. It
>> > is easily seen that the lowest of these P values is not any 'better'
>> > than the highest of the
>> > P values.
>> >
>> > Best wishes,
>> >
>> > Matthias
>> >
>>
>> Correct me if I'm wrong, but isn't that the point? I assume that the
>> hypothesis is that one or more of these genes are true predictors,
>> i.e. for these genes the p-value should be significant. For all the
>> other genes, the p-value is uniformly distributed. Using a
>> significance level of 0.01, and an a priori knowledge that there are
>> significant genes, you will end up with on the order of 20 genes, some
>> of which are the "true" predictors, and the rest being false
>> positives. this set of 20 genes can then be further analysed. A much
>> smaller and easier problem to solve, no?
>>
>>
>> /Gustaf
> 
> Sorry, it should say 200 genes instead of 20.
> 

I agree with your general point, but want to make one small quibble: the choice of 0.01 as a cutoff depends pretty strongly on the distribution of the p-value under the alternative. With a small sample size and/or a small effect size, that may miss the majority of the true predictors. You may need it to be 0.1 or higher to catch most of them, and then you'll have 10 times as many false positives to wade through (but still 10 times fewer than you started with, so your main point still holds).

Duncan Murdoch



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Feb 2008 - 15:00:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 13 Feb 2008 - 15:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive