Re: [R] summing values by week - based on daily dates - but with some dates missing

From: Dimitri Liakhovitski <dimitri.liakhovitski_at_gmail.com>
Date: Wed, 30 Mar 2011 17:35:22 -0400

Henrique, this is beautiful, thank you so much. This is a great and correct solution.

A stupid question: what does the line is.na(wk) <- wk %% 1 == 0 do? Thank you!
Dimitri

On Wed, Mar 30, 2011 at 5:25 PM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
> You're right:
>
> wk <- as.numeric(format(myframe$dates, "%Y.%W"))
> is.na(wk) <- wk %% 1 == 0
> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>
>
> On Wed, Mar 30, 2011 at 6:10 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski_at_gmail.com> wrote:
>> Yes, zoo! That's what I forgot. It's great.
>> Henrique, thanks a lot! One question:
>>
>> if the data are as I originally posted - then week numbered 52 is
>> actually the very first week (it straddles 2008-2009).
>> What if the data much longer (like in the code below - same as before,
>> but more dates) so that we have more than 1 year to deal with.
>> It looks like this code is lumping everything into 52 weeks. And my
>> goal is to keep each week independent. If I have 2 years, then it
>> should be 100+ weeks. Makes sense?
>> Thank you!
>>
>> ### Creating a longer example data set:
>> mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2)
>> myfactor<-c(rep("group.1",500),rep("group.2",500))
>> set.seed(123)
>> myvalues<-runif(1000,0,1)
>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>> (myframe)
>> dim(myframe)
>>
>> ## Removing same rows (dates) unsystematically:
>> set.seed(123)
>> removed.group1<-sample(1:500,size=150,replace=F)
>> set.seed(456)
>> removed.group2<-sample(501:1000,size=150,replace=F)
>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>> to.remove<-to.remove[order(to.remove)]
>> myframe<-myframe[-to.remove,]
>> (myframe)
>> dim(myframe)
>> names(myframe)
>>
>> library(zoo)
>> wk <- as.numeric(format(myframe$dates, '%W'))
>> is.na(wk) <- wk == 0
>> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>> solution<-solution[order(solution$group),]
>> write.csv(solution,file="test.csv",row.names=F)
>>
>>
>>
>> On Wed, Mar 30, 2011 at 4:45 PM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
>>> Try this:
>>>
>>> library(zoo)
>>> wk <- as.numeric(format(myframe$dates, '%W'))
>>> is.na(wk) <- wk == 0
>>> aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>>>
>>>
>>>
>>> On Wed, Mar 30, 2011 at 4:35 PM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski_at_gmail.com> wrote:
>>>> Henrique, this is great, thank you!
>>>>
>>>> It's almost what I was looking for! Only one small thing - it doesn't
>>>> "merge" the results for weeks that "straddle" 2 years. In my example -
>>>> last week of year 2008 and the very first week of 2009 are one week.
>>>> Any way to "join them"?
>>>> Asking because in reality I'll have many years and hundreds of groups
>>>> - hence, it'll be hard to do it manually.
>>>>
>>>>
>>>> BTW - does format(dates,"%Y.%W") always consider weeks as starting with Mondays?
>>>>
>>>> Thank you very much!
>>>> Dimitri
>>>>
>>>>
>>>> On Wed, Mar 30, 2011 at 2:55 PM, Henrique Dallazuanna <wwwhsd_at_gmail.com> wrote:
>>>>> Try this:
>>>>>
>>>>> aggregate(value ~ group + format(dates, "%Y.%W"), myframe, FUN = sum)
>>>>>
>>>>>
>>>>> On Wed, Mar 30, 2011 at 11:23 AM, Dimitri Liakhovitski
>>>>> <dimitri.liakhovitski_at_gmail.com> wrote:
>>>>>> Dear everybody,
>>>>>>
>>>>>> I have the following challenge. I have a data set with 2 subgroups,
>>>>>> dates (days), and corresponding values (see example code below).
>>>>>> Within each subgroup: I need to aggregate (sum) the values by week -
>>>>>> for weeks that start on a Monday (for example, 2008-12-29 was a
>>>>>> Monday).
>>>>>> I find it difficult because I have missing dates in my data - so that
>>>>>> sometimes I don't even have the date for some Mondays. So, I can't
>>>>>> write a proper loop.
>>>>>> I want my output to look something like this:
>>>>>> group   dates   value
>>>>>> group.1 2008-12-29  3.0937
>>>>>> group.1 2009-01-05  3.8833
>>>>>> group.1 2009-01-12  1.362
>>>>>> ...
>>>>>> group.2 2008-12-29  2.250
>>>>>> group.2 2009-01-05  1.4057
>>>>>> group.2 2009-01-12  3.4411
>>>>>> ...
>>>>>>
>>>>>> Thanks a lot for your suggestions! The code is below:
>>>>>> Dimitri
>>>>>>
>>>>>> ### Creating example data set:
>>>>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 43, by = "day"),2)
>>>>>> myfactor<-c(rep("group.1",43),rep("group.2",43))
>>>>>> set.seed(123)
>>>>>> myvalues<-runif(86,0,1)
>>>>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>>>>> (myframe)
>>>>>> dim(myframe)
>>>>>>
>>>>>> ## Removing same rows (dates) unsystematically:
>>>>>> set.seed(123)
>>>>>> removed.group1<-sample(1:43,size=11,replace=F)
>>>>>> set.seed(456)
>>>>>> removed.group2<-sample(44:86,size=11,replace=F)
>>>>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>>>>> to.remove<-to.remove[order(to.remove)]
>>>>>> myframe<-myframe[-to.remove,]
>>>>>> (myframe)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dimitri Liakhovitski
>>>>>> Ninah Consulting
>>>>>> www.ninah.com
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help_at_r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Henrique Dallazuanna
>>>>> Curitiba-Paraná-Brasil
>>>>> 25° 25' 40" S 49° 16' 22" O
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>> Ninah Consulting
>>>> www.ninah.com
>>>>
>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 30 Mar 2011 - 21:39:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Mar 2011 - 22:10:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive