RE: [R] Rank-based p-value on large dataset

From: Huntsinger, Reid <reid_huntsinger_at_merck.com>
Date: Fri 04 Mar 2005 - 09:38:50 EST


When you say the 130,000 points are from the empirical distribution, how did you get them? Is each one really one of the values of y? If you sorted y first, would you know which one (ie which index) each x is? (Sorting 80,000 elements took essentially no time at all on my sub-gigahertz Pentium III.) But maybe that's not an option... more details would help.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Sean Davis Sent: Thursday, March 03, 2005 5:22 PM
To: r-help
Subject: [R] Rank-based p-value on large dataset

I have a fairly simple problem--I have about 80,000 values (call them y) that I am using as an empirical distribution and I want to find the p-value (never mind the multiple testing issues here, for the time being) of 130,000 points (call them x) from the empirical distribution.   I typically do that (for one-sided test) something like

loop over i in x
p.val[i] = sum(y>x[i])/length(y)

and repeat for all i. However, length(x) is large here as is length(y), so this process takes quite a long time. Any suggestions?

Thanks,
Sean



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Mar 04 09:43:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:40 EST