From: maneesh deshpande <dmaneesh_at_hotmail.com>

Date: Thu 23 Feb 2006 - 14:45:37 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Feb 23 14:55:40 2006

Date: Thu 23 Feb 2006 - 14:45:37 EST

Thanks,

Maneesh

*>From: Adaikalavan Ramasamy <ramasamy@cancer.org.uk>
**>Reply-To: ramasamy@cancer.org.uk
**>To: maneesh deshpande <dmaneesh@hotmail.com>
*

>CC: r-help@stat.math.ethz.ch

*>Subject: Re: [R] Ranking within factor subgroups
**>Date: Wed, 22 Feb 2006 03:44:45 +0000
**>
**>It might help to give a simple reproducible example in the future. For
**>example
**>
**> df <- cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
**> B=rpois(500, 50), C=rpois(500, 30) )
**>
**>might generate something like
**>
**> date A B C
**> 1 1 93 51 32
**> 2 1 95 51 30
**> 3 1 102 59 28
**> 4 1 105 52 32
**> 5 1 105 53 26
**> 6 1 99 59 37
**> ... . ... .. ..
**> 495 5 100 57 19
**> 496 5 96 47 44
**> 497 5 111 56 35
**> 498 5 105 49 23
**> 499 5 105 61 30
**> 500 5 92 53 32
**>
**>Here is my proposed solution. Can you double check with your existing
**>functions to see if they are correct.
**>
**> decile.fn <- function(x, nbreaks=10){
**> br <- quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
**> br[1] <- -Inf
**> return( cut(x, br, labels=F) )
**> }
**>
**> out <- apply( df[ ,c("A", "B", "C")], 2,
**> function(v) unlist( tapply( v, df$date, decile.fn ) ) )
**>
**> rownames(out) <- rownames(df)
**> out <- cbind(df$date, out)
**>
**>Regards, Adai
**>
**>
**>
**>On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
**> > Hi,
**> >
**> > I have a dataframe, x of the following form:
**> >
**> > Date Symbol A B C
**> > 20041201 ABC 10 12 15
**> > 20041201 DEF 9 5 4
**> > ...
**> > 20050101 ABC 5 3 1
**> > 20050101 GHM 12 4 2
**> > ....
**> >
**> > here A, B,C are properties of a set symbols recorded for a given date.
**> > I wante to decile the symbols For each date and property and
**> > create another set of columns "bucketA","bucketB", "bucketC" containing
**>the
**> > decile rank
**> > for each symbol. The following non-vectorized code does what I want,
**> >
**> > bucket <- function(data,nBuckets) {
**> > q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
**> > q[1] <- q[1] - 0.1 # need to do this to ensure there are no extra
**>NAs
**> > cut(data,q,include.lowest=T,labels=F)
**> > }
**> >
**> > calcDeciles <- function(x,colNames) {
**> > nBuckets <- 10
**> > dates <- unique(x$Date)
**> > for ( date in dates) {
**> > iVec <- x$Date == date
**> > xx <- x[iVec,]
**> > for (colName in colNames) {
**> > data <- xx[,colName]
**> > bColName <- paste("bucket",colName,sep="")
**> > x[iVec,bColName] <- bucket(data,nBuckets)
**> > }
**> > }
**> > x
**> > }
**> >
**> > x <- calcDeciles(x,c("A","B","C"))
**> >
**> >
**> > I was wondering if it is possible to vectorize the above function to
**>make it
**> > more efficient.
**> > I tried,
**> > rlist <- tapply(x$A,x$Date,bucket)
**> > but I am not sure how to assign the contents of "rlist" to their
**>appropriate
**> > slots in the original
**> > dataframe.
**> >
**> > Thanks,
**> >
**> > Maneesh
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide!
**>http://www.R-project.org/posting-guide.html
**> >
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Feb 23 14:55:40 2006

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:42:39 EST
*