Re: [R] summing values by week - based on daily dates - but with some dates missing

From: Dimitri Liakhovitski <dimitri.liakhovitski_at_gmail.com>
Date: Wed, 30 Mar 2011 17:10:30 -0400

Yes, zoo! That's what I forgot. It's great. Henrique, thanks a lot! One question:

if the data are as I originally posted - then week numbered 52 is actually the very first week (it straddles 2008-2009). What if the data much longer (like in the code below - same as before, but more dates) so that we have more than 1 year to deal with. It looks like this code is lumping everything into 52 weeks. And my goal is to keep each week independent. If I have 2 years, then it should be 100+ weeks. Makes sense?
Thank you!

### Creating a longer example data set:
mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2) myfactor<-c(rep("group.1",500),rep("group.2",500)) set.seed(123)
myvalues<-runif(1000,0,1)
myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues) (myframe)
dim(myframe)

## Removing same rows (dates) unsystematically: set.seed(123)
removed.group1<-sample(1:500,size=150,replace=F) set.seed(456)
removed.group2<-sample(501:1000,size=150,replace=F) to.remove<-c(removed.group1,removed.group2);length(to.remove) to.remove<-to.remove[order(to.remove)]
myframe<-myframe[-to.remove,]
(myframe)
dim(myframe)
names(myframe)

library(zoo)
wk <- as.numeric(format(myframe$dates, '%W')) is.na(wk) <- wk == 0
solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum) solution<-solution[order(solution$group),] write.csv(solution,file="test.csv",row.names=F)

On Wed, Mar 30, 2011 at 4:45 PM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
> Try this:
>
> library(zoo)
> wk <- as.numeric(format(myframe$dates, '%W'))
> is.na(wk) <- wk == 0
> aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>
>
>
> On Wed, Mar 30, 2011 at 4:35 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski_at_gmail.com> wrote:
>> Henrique, this is great, thank you!
>>
>> It's almost what I was looking for! Only one small thing - it doesn't
>> "merge" the results for weeks that "straddle" 2 years. In my example -
>> last week of year 2008 and the very first week of 2009 are one week.
>> Any way to "join them"?
>> Asking because in reality I'll have many years and hundreds of groups
>> - hence, it'll be hard to do it manually.
>>
>>
>> BTW - does format(dates,"%Y.%W") always consider weeks as starting with Mondays?
>>
>> Thank you very much!
>> Dimitri
>>
>>
>> On Wed, Mar 30, 2011 at 2:55 PM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
>>> Try this:
>>>
>>> aggregate(value ~ group + format(dates, "%Y.%W"), myframe, FUN = sum)
>>>
>>>
>>> On Wed, Mar 30, 2011 at 11:23 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski_at_gmail.com> wrote:
>>>> Dear everybody,
>>>>
>>>> I have the following challenge. I have a data set with 2 subgroups,
>>>> dates (days), and corresponding values (see example code below).
>>>> Within each subgroup: I need to aggregate (sum) the values by week -
>>>> for weeks that start on a Monday (for example, 2008-12-29 was a
>>>> Monday).
>>>> I find it difficult because I have missing dates in my data - so that
>>>> sometimes I don't even have the date for some Mondays. So, I can't
>>>> write a proper loop.
>>>> I want my output to look something like this:
>>>> group   dates   value
>>>> group.1 2008-12-29  3.0937
>>>> group.1 2009-01-05  3.8833
>>>> group.1 2009-01-12  1.362
>>>> ...
>>>> group.2 2008-12-29  2.250
>>>> group.2 2009-01-05  1.4057
>>>> group.2 2009-01-12  3.4411
>>>> ...
>>>>
>>>> Thanks a lot for your suggestions! The code is below:
>>>> Dimitri
>>>>
>>>> ### Creating example data set:
>>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 43, by = "day"),2)
>>>> myfactor<-c(rep("group.1",43),rep("group.2",43))
>>>> set.seed(123)
>>>> myvalues<-runif(86,0,1)
>>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>>> (myframe)
>>>> dim(myframe)
>>>>
>>>> ## Removing same rows (dates) unsystematically:
>>>> set.seed(123)
>>>> removed.group1<-sample(1:43,size=11,replace=F)
>>>> set.seed(456)
>>>> removed.group2<-sample(44:86,size=11,replace=F)
>>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>>> to.remove<-to.remove[order(to.remove)]
>>>> myframe<-myframe[-to.remove,]
>>>> (myframe)
>>>>
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>> Ninah Consulting
>>>> www.ninah.com
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 30 Mar 2011 - 21:13:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Mar 2011 - 23:40:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive