From: Lana Schaffer <schaffer_at_scripps.edu>

Date: Fri, 13 Jun 2008 09:52:50 -0700

Date: Fri, 13 Jun 2008 09:52:50 -0700

Jim,

d.frame[[i]] is a list of data.frames and seqFile is a
data.frame. I have coverted them to vectors/matrixes and
the timing is the same as data.frame. 'index' is unique
in both structures. The list is subset into data.frame/matrix
structures.

Lana

-----Original Message-----

From: jim holtman [mailto:jholtman_at_gmail.com]
Sent: Friday, June 13, 2008 9:45 AM

To: Lana Schaffer

Cc: r-help_at_r-project.org

Subject: Re: [R] alternative to matching/merge?

What is the structure of 'd.frame' and 'segFile'? Run Rprof so that we can see which of the functions it is spending its time in. What happens if x$index is not in seqFile$index? Are the values in the 'index' unique in both structures? Subsetting a data frame can be expensive when compared to using a matrix. Could you use a matrix instead of a data frame; are all the columns the same mode? Again either a subset of data would be helpful or an 'str' on the data objects being used so that we can understand what they are.

On Fri, Jun 13, 2008 at 12:03 PM, Lana Schaffer <schaffer_at_scripps.edu>
wrote:

*> Jim,
*

> My code is this:

*> mergefunc <- function(x,seqFile){
**> # merge(seqFile,x)
**> cbind(x, seqFile[ match(as.vector(x$index), as.vector(seqFile$index)),
**> ])
**> }
**> LIX <- lapply(d.frame[[1]], mergefunc,seqFile=seqFile) Each
**> matrix/data.frame takes 0.2 seconds and then to do this 1240 times
**> takes ~4 minutes.
**> Thanks,
**> Lana
**>
**> -----Original Message-----
**> From: jim holtman [mailto:jholtman_at_gmail.com]
**> Sent: Thursday, June 12, 2008 6:40 PM
**> To: Lana Schaffer
**> Cc: r-help_at_r-project.org
**> Subject: Re: [R] alternative to matching/merge?
**>
**> It would be nice if you at least included the code that you are using
**> and a subset of the data. Have you run Rprof to determine which of
**> the functions is consuming the time?
**>
**> On Thu, Jun 12, 2008 at 3:25 PM, Lana Schaffer <schaffer_at_scripps.edu>
**> wrote:
**>>
**>> Greetings,
**>> I am doing matching/merge for a table (40919x3) to data which is in
**>> the form of a list of 1268 data.frames. Using lapply this is taking
**>> ~5 minutes. I know that the match/merge functions are time
**>> consuming,
**>
**>> so is there an alternative to this accomplish this goal? is lapply
**>> not efficient?
**>>
**>> Lana Schaffer
**>> Biostatistics/Informatics
**>> The Scripps Research Institute
**>> DNA Array Core Facility
**>> La Jolla, CA 92037
**>> (858) 784-2263
**>> (858) 784-2994
**>> schaffer_at_scripps.edu
**>>
**>> ______________________________________________
**>> R-help_at_r-project.org mailing list
**>> https://stat.ethz.ch/mailman/listinfo/r-help
**>> PLEASE do read the posting guide
**>> http://www.R-project.org/posting-guide.html
**>> and provide commented, minimal, self-contained, reproducible code.
**>>
**>
**>
**>
**> --
**> Jim Holtman
**> Cincinnati, OH
**> +1 513 646 9390
**>
**> What is the problem you are trying to solve?
**>
*

-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Fri 13 Jun 2008 - 20:08:08 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 20:31:55 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*