From: hadley wickham <h.wickham_at_gmail.com>

Date: Mon, 21 Jul 2008 23:08:19 -0500

Date: Mon, 21 Jul 2008 23:08:19 -0500

Using Jim's index with my method gives you the best of both worlds:

x <- matrix(sample(20, 1e6 * 3, replace = T), ncol = 3)

system.time({

dataBreaks <- cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
# sum up column 3 and output the first two columns with the indices
result <- lapply(split(seq(nrow(x)), dataBreaks), function(.sect){

c(x[.sect[1], 1:2], sum(x[.sect, 3]))
})

a <- do.call(rbind, result)

})

system.time({

index <- cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
b <- cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))
})

all.equal(a, b)

On my computer, Jim's method took 60 seconds and mine took 16.

Hadley

On Sun, Jul 20, 2008 at 8:41 PM, Ralph S. <ruffel1_at_hotmail.com> wrote:

*>
*

> yes - thank you very much! slowly getting to the full power of R . . .

*>
**> ----------------------------------------
**>> Date: Sun, 20 Jul 2008 21:21:35 -0400
**>> From: jholtman_at_gmail.com
**>> To: ruffel1_at_hotmail.com
**>> Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
**>> CC: h.wickham_at_gmail.com; r-help_at_r-project.org
**>>
**>> Does this do what you want:
**>>
**>>> # following up on another idea that was presented
**>>> # where are the breaks
**>>> dataBreaks <- cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
**>>> # sum up column 3 and output the first two columns with the indices
**>>> result <- lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
**>> + c(x[.sect[1], 1:2], sum(x[.sect, 3]))
**>> + })
**>>> do.call(rbind, result)
**>> [,1] [,2] [,3]
**>> 0 1 7 3
**>> 1 2 4 2
**>> 2 3 2 3
**>> 3 1 7 10
**>>
**>>
**>> On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. wrote:
**>>>
**>>> The first and second column are actually indices of another matrix (my example may make this not sufficiently clear). I want to compare the sum with that corresponding entry, and then record the result of that.
**>>>
**>>> Any idea?
**>>>
**>>> Best,
**>>>
**>>> Ralph
**>>>
**>>>
**>>>
**>>> ----------------------------------------
**>>>> Date: Sun, 20 Jul 2008 16:50:41 -0700
**>>>> From: h.wickham_at_gmail.com
**>>>> To: ruffel1_at_hotmail.com
**>>>> Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
**>>>> CC: r-help_at_r-project.org
**>>>>
**>>>> On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham wrote:
**>>>>> On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. wrote:
**>>>>>>
**>>>>>> Hi,
**>>>>>>
**>>>>>> I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor.
**>>>>>>
**>>>>>> My matrix looks like this:
**>>>>>>
**>>>>>> x<-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)
**>>>>>>
**>>>>>> I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix.
**>>>>>>
**>>>>>> That is, I want output in the matrix of form:
**>>>>>>
**>>>>>> 1 7 3
**>>>>>> 2 4 2
**>>>>>> 3 2 3
**>>>>>> 1 7 10
**>>>>>>
**>>>>>
**>>>>> Why that and not:
**>>>>>
**>>>>> 1 7 13
**>>>>> 2 4 2
**>>>>> 3 2 3
**>>>>>
**>>>>> ?
**>>>>
**>>>> Here's a solution for that case:
**>>>>
**>>>> index <- x[, 2] + x[, 1] * max(x[, 2])
**>>>> cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))
**>>>>
**>>>> It takes about half a second for a million row matrix.
**>>>>
**>>>> Hadley
**>>>>
**>>>>
**>>>>
**>>>> --
**>>>> http://had.co.nz/
**>>>
**>>> _________________________________________________________________
**>>> With Windows Live for mobile, your contacts travel with you.
**>>>
**>>> 072008
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>>>
**>>
**>>
**>>
**>> --
**>> Jim Holtman
**>> Cincinnati, OH
**>> +1 513 646 9390
**>>
**>> What is the problem you are trying to solve?
**>
**> _________________________________________________________________
**> Use video conversation to talk face-to-face with Windows Live Messenger.
**> http://www.windowslive.com/messenger/connect_your_way.html?ocid=TXT_TAGLM_WL_Refresh_messenger_video_072008
*

-- http://had.co.nz/ ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue 22 Jul 2008 - 04:12:26 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 01 Aug 2008 - 18:33:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*