Re: [R] Ranking within factor subgroups

From: Adaikalavan Ramasamy <ramasamy_at_cancer.org.uk>
Date: Wed 22 Feb 2006 - 14:44:45 EST

It might help to give a simple reproducible example in the future. For example

 df <- cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),

                         B=rpois(500, 50), C=rpois(500, 30) )

might generate something like

	    date   A  B  C
	  1    1  93 51 32
	  2    1  95 51 30
	  3    1 102 59 28
	  4    1 105 52 32
	  5    1 105 53 26
	  6    1  99 59 37
	...    . ... .. ..
	495    5 100 57 19
	496    5  96 47 44
	497    5 111 56 35
	498    5 105 49 23
	499    5 105 61 30
	500    5  92 53 32

Here is my proposed solution. Can you double check with your existing functions to see if they are correct.

   decile.fn <- function(x, nbreaks=10){

     br     <- quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
     br[1]  <- -Inf
     return( cut(x, br, labels=F) )

   }

   out <- apply( df[ ,c("A", "B", "C")], 2,

                 function(v) unlist( tapply( v, df$date, decile.fn ) ) )

   rownames(out) <- rownames(df)
   out <- cbind(df$date, out)

Regards, Adai

On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
> Hi,
>
> I have a dataframe, x of the following form:
>
> Date Symbol A B C
> 20041201 ABC 10 12 15
> 20041201 DEF 9 5 4
> ...
> 20050101 ABC 5 3 1
> 20050101 GHM 12 4 2
> ....
>
> here A, B,C are properties of a set symbols recorded for a given date.
> I wante to decile the symbols For each date and property and
> create another set of columns "bucketA","bucketB", "bucketC" containing the
> decile rank
> for each symbol. The following non-vectorized code does what I want,
>
> bucket <- function(data,nBuckets) {
> q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
> q[1] <- q[1] - 0.1 # need to do this to ensure there are no extra NAs
> cut(data,q,include.lowest=T,labels=F)
> }
>
> calcDeciles <- function(x,colNames) {
> nBuckets <- 10
> dates <- unique(x$Date)
> for ( date in dates) {
> iVec <- x$Date == date
> xx <- x[iVec,]
> for (colName in colNames) {
> data <- xx[,colName]
> bColName <- paste("bucket",colName,sep="")
> x[iVec,bColName] <- bucket(data,nBuckets)
> }
> }
> x
> }
>
> x <- calcDeciles(x,c("A","B","C"))
>
>
> I was wondering if it is possible to vectorize the above function to make it
> more efficient.
> I tried,
> rlist <- tapply(x$A,x$Date,bucket)
> but I am not sure how to assign the contents of "rlist" to their appropriate
> slots in the original
> dataframe.
>
> Thanks,
>
> Maneesh
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Feb 22 15:01:01 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:39 EST