Date: Fri, 27 Jun 2008 22:09:35 +0100

> This one should be easy but it's giving me a hard time mostly because tapply

*> puts the results in a list. I want to calculate the cumulative sum of a
**> variable in a dataframe, but with the accumulation only within each level of
**> a factor. For a very simple example, take:
*

*> df$willdo <- unlist(tapply(df$x, df$fac, cumsum))
*

> df$ideal <- df$willdo - df$x

*> df
*

x fac willdo ideal

1 1 a 1 0 2 1 a 2 1 3 1 a 3 2 4 1 a 4 3 5 1 a 5 4 6 2 b 2 0 7 2 b 4 2 8 2 b 6 4 9 2 b 8 6 10 2 b 10 8 11 3 c 3 0 12 3 c 6 3 13 3 c 9 6 14 3 c 12 9 15 3 c 15 12

**HTH
**
G

*>
*

> > df <-

*> data.frame(x=c(rep(1,5),rep(2,5),rep(3,5)),fac=gl(3,5,labels=letters[1:3]))
**> > df
**> x fac
**> 1 1 a
**> 2 1 a
**> 3 1 a
**> 4 1 a
**> 5 1 a
**> 6 2 b
**> 7 2 b
**> 8 2 b
**> 9 2 b
**> 10 2 b
**> 11 3 c
**> 12 3 c
**> 13 3 c
**> 14 3 c
**> 15 3 c
**>
**> I'd like to create another column in the dataframe so it looks like this,
**> and make sure that the cumulative sums still match the right levels of the
**> factor. I've included a "willdo" column that's just a cumulative sum, and
**> an "ideal" column that's the cumulative sum minus the current value - the
**> column headings are self explanatory.
**>
**> > answer
**> x fac willdo ideal
**> 1 1 a 1 0
**> 2 1 a 2 1
**> 3 1 a 3 2
**> 4 1 a 4 3
**> 5 1 a 5 4
**> 6 2 b 2 0
**> 7 2 b 4 2
**> 8 2 b 6 4
**> 9 2 b 8 6
**> 10 2 b 10 8
**> 11 3 c 3 0
**> 12 3 c 6 3
**> 13 3 c 9 6
**> 14 3 c 12 9
**> 15 3 c 15 12
**>
**>
*

