Re: [R] Need some hint on faster data manipulation.

From: Kenn Konstabel <lebatsnok_at_gmail.com>
Date: Sat, 17 May 2008 23:28:26 +0300

Can it be this:

foo<-tapply(d$tt, d$v, min)
data.frame(v=names(foo), tt=foo)

On Sat, May 17, 2008 at 10:56 PM, jim holtman <jholtman_at_gmail.com> wrote:

> Is this what you want:
>
> > v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
> >
> > tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
> > d<-data.frame(v,tt)
> > do.call(rbind, lapply(split(d, d$v), function(x){
> + x[which.min(x$tt),]
> + }))
> v tt
> v1 v1 1
> v2 v2 1
> v3 v3 4
> v4 v4 2
> v5 v5 1
> >
> >
>
>
> On Sat, May 17, 2008 at 3:48 PM, souvik banerjee <bansouvik_at_gmail.com>
> wrote:
>
> > Hi,
> > I am facing a problem in data manipulation. Suppose a data
> frame
> > contains two columns. The first column consists of some repeated
> characters
> > and the second consists of some numerical values. The problem is to
> extract
> > and create a new data frame consisting of rows of each unique character
> of
> > first column with minimum second column entry. For example if "d" is the
> > data frame, created with the following R code
> >
> >
> > v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))
> >
> > tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
> > d<-data.frame(v,tt)
> >
> > then the answer would be
> >
> >
> > v tt
> >
> > v1 1
> >
> > v2 1
> >
> > v3 4
> >
> > v4 2
> >
> > v5 1
> >
> >
> >
> > I have written a small R code given below that does the job (assumming
> "d"
> > to the initial data frame)
> >
> >
> >
> > b<-data.frame(NULL)
> >
> > i<-1
> >
> > x<-d[1,]
> >
> > while(i<dim(d)[1])
> >
> > {
> >
> > if(length(unique(x[,1]))==1)
> >
> > {
> >
> > x<-rbind(x,d[i+1,])
> >
> > i=i+1
> >
> > }
> >
> > if(length(unique(x[,1]))>1)
> >
> > {
> >
> > y<-x[1:(nrow(x)-1),]
> >
> > z<-which(y[,2]==min(y[,2]))
> >
> > b<-rbind(b,y[z,])
> >
> > x<-d[i,]
> >
> > }
> >
> > }
> >
> > z<-which(x[,2]==min(x[,2]))
> >
> > b<-rbind(b,x[z,])
> >
> > b
> >
> >
> >
> > The code is working properly giving me the desired result, but the
> problem
> > is that I have to repeat this procedure for many data frames and nearly
> > all
> > the data frame contains approximately 15,000 repeated characters with
> more
> > than 12,500 unique characters. Using the above code in a loop is taking a
> > considerable amount of time to compute.
> > Can anybody suggest me of a faster approach?
> >
> > Regards
> >
> > Souvik Bandyopadhyay
> > Research Fellow,
> > Dept Of Statistics
> > Calcutta University
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<
> http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 17 May 2008 - 20:33:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 17 May 2008 - 21:30:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive