From: Marc Schwartz <marc_schwartz_at_comcast.net>

Date: Sat, 01 Dec 2007 13:33:21 -0600

yright : num 1

}

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 01 Dec 2007 - 19:36:33 GMT

Date: Sat, 01 Dec 2007 13:33:21 -0600

On Sat, 2007-12-01 at 18:40 +0000, David Winsemius wrote:

> David Winsemius <dwinsemius@comcast.net> wrote in

*> news:Xns99F989B3A3057dNOTwinscomcast_at_80.91.229.13:
**>
**> > "tom soyer" <tom.soyer_at_gmail.com> wrote in
**> > news:65cc7bdf0712010951p451a993i70da89f285d801de_at_mail.gmail.com:
**> >
**> >> John,
**> >>
**> >> The Excel's percentrank function works like this: if one has a number,
**> >> x for example, and one wants to know the percentile of this number in
**> >> a given data set, dataset, one would type =percentrank(dataset,x) in
**> >> Excel to calculate the percentile. So for example, if the data set is
**> >> c(1:10), and one wants to know the percentile of 2.5 in the data set,
**> >> then using the percentrank function one would get 0.166, i.e., 2.5 is
**> >> in the 16.6th percentile.
**> >>
**> >> I am not sure how to program this function in R. I couldn't find it as
**> >> a built-in function in R either. It seems to be an obvious choice for
**> >> a built-in function. I am very surprised, but maybe we both missed it.
**> >
**> > My nomination for a function with a similar result would be ecdf(), the
**> > empirical cumulative distribution function. It is of class "function"
**> so
**> > efforts to index ecdf(.)[.] failed for me.
*

You can use ls.str() to look into the function environment:

> ls.str(environment(ecdf(x)))

f : num 0

method : int 2

n : int 25 x : num [1:25] -2.215 -1.989 -0.836 -0.820 -0.626 ... y : num [1:25] 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 ...yleft : num 0

yright : num 1

You can then use get() or mget() within the function environment to return the requisite values. Something along the lines of the following within the function percentrank():

percentrank <- function(x, val)

{

env.x <- environment(ecdf(x))

res <- mget(c("x", "y"), env.x)

Ind <- which(sapply(seq(length(res$x)),

function(i) isTRUE(all.equal(res$x[i], val))))res$y[Ind]

}

Thus:

set.seed(1)

x <- rnorm(25)

*> x
*

[1] -0.62645381 0.18364332 -0.83562861 1.59528080 0.32950777 [6] -0.82046838 0.48742905 0.73832471 0.57578135 -0.30538839 [11] 1.51178117 0.38984324 -0.62124058 -2.21469989 1.12493092 [16] -0.04493361 -0.01619026 0.94383621 0.82122120 0.59390132 [21] 0.91897737 0.78213630 0.07456498 -1.98935170 0.61982575

> percentrank(x, 0.48742905)

[1] 0.56

One other approach, which returns the values and their respective rank percentiles is:

> cumsum(prop.table(table(x)))

-2.2146998871775 -1.98935169586337 -0.835628612410047

0.04 0.08 0.12 -0.820468384118015 -0.626453810742333 -0.621240580541804 0.16 0.20 0.24 -0.305388387156356 -0.0449336090152308 -0.0161902630989461 0.28 0.32 0.36 0.0745649833651906 0.183643324222082 0.329507771815361 0.40 0.44 0.48 0.389843236411431 0.487429052428485 0.575781351653492 0.52 0.56 0.60 0.593901321217509 0.61982574789471 0.738324705129217 0.64 0.68 0.72 0.782136300731067 0.821221195098089 0.918977371608218 0.76 0.80 0.84 0.9438362106853 1.12493091814311 1.51178116845085 0.88 0.92 0.96 1.59528080213779 1.00

**HTH,
**
Marc Schwartz

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 01 Dec 2007 - 19:36:33 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 05 Dec 2007 - 18:30:17 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*