Re: [R] getting percentiles by factor

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Thu, 10 Mar 2011 20:20:45 -0500

On Mar 10, 2011, at 11:08 AM, Paolo Cavatore wrote:

> Hi David,
>
> thanks for your comment...I managed to sort it out.
>
> Below the final code...paolo
>
> #################################
> myExample <- data.frame(Ret=seq(-2, 2.5,
> by=0.5),PE=seq(10,19),Sectors=rep(c("Financial","Industrial"),5))
> myExample <- na.omit(myExample)
> myecdf2 <- function(x, column, sortAsc="True") {

That still looks suspect since "True" is not TRUE.

> # x data.frame/list being analysed
> # column to calculate percentile on
> # sortAsc sorting order (True Ascending, False Descending)
> w1 <- ecdf(x[[column]])
> w2 <- if (sortAsc) w1(x[[column]]) * 100 else abs(w1(x[[column]]) *
> 100 - 100)
> w3 <- transform(x, myPerc=w2)
> names(w3)[ncol(w3)] <- paste("Perc.",column,sep="")
> return(w3)
> }
> myExampleEnd2 <- lapply(split(myExample, myExample$Sectors),
> myecdf2, column="Ret", sortAsc="True")
> myExampleEnd2 <- unsplit(myExampleEnd2, myExample$Sectors)

That's interesting. I was not aware of unsplit. Would have probably tried:
  do.call("cbind", myExampleEnd2)

... but what ever works.

(Except please stop sending HTML mail.)

-- 
David

> 2011/3/10 David Winsemius <dwinsemius@comcast.net>
>
> On Mar 10, 2011, at 3:37 AM, Paolo Cavatore wrote:
>
> Hello,
>
> I'm trying to get percentiles (PERCENTRANK for excel users) by
> factor in the
> following data.frame:
>
> myExample <- data.frame(Ret=seq(-2, 2.5,
> by=0.5),PE=seq(10,19),Sectors=rep(c("Financial","Industrial"),5))
> myExample <- na.omit(myExample)
>
> Thanks to Patrick I I managed to put together the following lines
> which does
> it for the "Ret" column:
>
> myecdf <- function(x, sortAsc) {
> w1 <- ecdf(x$Ret)
> w2 <- if (sortAsc) w1(x$Ret) * 100 else abs(w1(x$Ret) * 100 - 100)
> w3 <- transform(x, myPerc=w2)
> return(w3)
> }
> myExampleEnd <- lapply(split(myExample, myExample$Sectors), myecdf,
> sortAsc="True")
> myExampleEnd <- unsplit(myExampleEnd, myExample$Sectors)
>
>
> I need to make the function more flexible accepting the name of the
> column
> to calculate percentiles on as a parameter but the following doesn't
> work:
>
> myecdf2 <- function(x, column, sortAsc=True) {
> # x data.frame/list being analysed
> # column to calculate percentiles on
> # sortAsc sorting order (True Ascending, False Descending)
> w1 <- ecdf(x$column)
> w2 <- if (sortAsc) w1(x$column) * 100 else abs(w1(x$column) * 100 -
> 100)
> w3 <- transform(x, myPerc=w2)
> return(w3)
> }
> myExampleEnd2 <- lapply(split(myExample, myExample$Sectors), myecdf2,
> column=Ret, sortAsc="True")
> myExampleEnd2 <- unsplit(myExampleEnd, myExample$Sectors)
>
>
> I'm not sure whether I'm going down the right way so any help is
> appreciated...also from scratch.
>
> i haven't debugged the code above but the first step is replace all
> your x$column instances with the equivalent:
>
> x[[column]]
>
> Moral: don't use $ when you want to have evaluation of the column
> argument.
>
> (Efforts to debug revealed an extraneous ")" after 100 and the need
> to replace column=Ret with column="Ret", but then foundered at:
>
>
> myExampleEnd2 <- unsplit(myExampleEnd, myExample$Sectors)
> Error in inherits(x, "data.frame") : object 'myExampleEnd' not found
>
> Removing the presumptively extraneous "End" does allow completion
> although there are warnings and I cannot follow the overall intent,
> so absolutely no guarantees.
>
> > myExampleEnd2 <- unsplit(myExample, myExample$Sectors)
> Warning messages:
> 1: In x[i] <- value[[j]] :
> number of items to replace is not a multiple of replacement length
> 2: In x[i] <- value[[j]] :
> number of items to replace is not a multiple of replacement length
> > myExampleEnd2
> [1] -2.0 10.0 -1.5 11.0 -1.0 12.0 -0.5 13.0 0.0 14.0
>
> --
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
David Winsemius, MD West Hartford, CT ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Fri 11 Mar 2011 - 01:28:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Mar 2011 - 01:50:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive