[R] Need some hint on faster data manipulation.

From: souvik banerjee <bansouvik_at_gmail.com>
Date: Sun, 18 May 2008 01:18:58 +0530


Hi,

            I am facing a problem in data manipulation. Suppose a data frame contains two columns. The first column consists of some repeated characters and the second consists of some numerical values. The problem is to extract and create a new data frame consisting of rows of each unique character of first column with minimum second column entry. For example if "d" is the data frame, created with the following R code

            v<-c(rep("v1",3), rep("v2",4), rep("v3",2),"v4",rep("v5",6))

            tt<-c(1,2,3,3,1,2,3,4,5,2,7,9,2,3,1,4)
            d<-data.frame(v,tt)

then the answer would be

                          v         tt

                         v1         1

                         v2         1

                         v3         4

                         v4         2

                         v5         1



I have written a small R code given below that does the job (assumming "d" to the initial data frame)

            b<-data.frame(NULL)

            i<-1

            x<-d[1,]

            while(i<dim(d)[1])

            {

                        if(length(unique(x[,1]))==1)


{
x<-rbind(x,d[i+1,]) i=i+1 } if(length(unique(x[,1]))>1)
{
y<-x[1:(nrow(x)-1),] z<-which(y[,2]==min(y[,2])) b<-rbind(b,y[z,]) x<-d[i,] } } z<-which(x[,2]==min(x[,2])) b<-rbind(b,x[z,]) b

The code is working properly giving me the desired result, but the problem is that I have to repeat this procedure for many data frames and nearly all the data frame contains approximately 15,000 repeated characters with more than 12,500 unique characters. Using the above code in a loop is taking a considerable amount of time to compute.
Can anybody suggest me of a faster approach?

Regards

 Souvik Bandyopadhyay
Research Fellow,
Dept Of Statistics
Calcutta University

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 17 May 2008 - 20:06:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 17 May 2008 - 21:30:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive