From: Christoph Jäckel <christoph.jaeckel_at_wi.tum.de>

Date: Tue, 19 Apr 2011 17:40:17 +0200

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Apr 2011 - 15:42:52 GMT

Date: Tue, 19 Apr 2011 17:40:17 +0200

Hi together,

I found a solution to my problem that works for me and performs reasonable well in my opinion. Instead of looping through both datasets, I decided to replicate each row in d2 from Min_Month to Max_Month and then use the match-function. Here is the code (I changed the original example to be able to compare the performance of the two solutions):

#Example for performance test

nobs <- 100 #Observation per company [26 companies]
d1 <- data.frame(cbind("ID"=rep(letters,each=nobs,1),"Month"=rep(1:nobs,26)),stringsAsFactors=FALSE)
d1$Month <- as.integer(d1$Month)

d1[,"Unique_ID"] <- mapply(paste,d1$ID,d1$Month,sep="_")

d2 <- data.frame(cbind("ID"=rep(letters,each=3,1),"Min_Month"=rep(c(1,as.integer(nobs/3),as.integer(2*nobs/3)),26),

"Max_Month"=rep(c(as.integer(nobs/3)-1,as.integer(2*nobs/3)-1,nobs),26),"Value1"=rep(1:3,26),

"Value2"=rep(4:6,26)),stringsAsFactors=FALSE)
d2$Min_Month <- as.integer(d2$Min_Month)
d2$Max_Month <- as.integer(d2$Max_Month)

#Possible solution; is there are more elegant one?
proc1 <- proc.time()

for (i in 1:nrow(d1)){

for (j in 1:nrow(d2)){

* if ((d1[i,"ID"] == d2[j,"ID"]) & (d1[i,"Month"]>= d2[j,"Min_Month"]) &
** (d1[i,"Month"]<= d2[j,"Max_Month"])) {break}
** }
** d1[i,"Value1"] <- d2[j,"Value1"]
** d1[i,"Value2"] <- d2[j,"Value2"]
*

}

proc1 <- proc.time() - proc1

#New Solution; Create rows first, then match it
proc2 <- proc.time()

createRows <- function(df){

* dates <- seq(df$Min_Month,df$Max_Month,by=1)
*

rows <- data.frame(ID = df$ID,date=dates, df[,c("Value1","Value2")])

* rows
*

}

library(plyr)

d2 <- ddply(d2,.(ID,Min_Month),createRows)

d2[,"Unique_ID"] <- mapply(paste,d2$ID,d2$date,sep="_") d1[,c("Value1_Alt", "Value2_Alt")] <- d2[match(d1$Unique_ID,d2$Unique_ID),c("Value1","Value2")]proc2 <- proc.time() - proc2

Hope that helps. If there is still a better solution, I would still be more than happy for an advice!

Christoph

On Sat, Apr 16, 2011 at 4:35 PM, Christoph Jäckel <christoph.jaeckel_at_wi.tum.de> wrote:

> > Hello R-Community, > > I have the following matching problem: I have two data.frames, one > with an observation every month (per company ID), and one with an > observation every quarter (per company ID; note that quarter means > fiscal quarter; therefore 1Q = Jan, Feb, Mar is not necessarily > correct and also, a fiscal quarter is not necessarily 3 month long). > > For every month and company, I want to get the correct value of that > quarter. Consequently, several months have the same value for one > quarter. As an example see the code below: > > #Monthly data > d1 <- data.frame(cbind(c(rep("A",5),rep("B",5)),c(1:5,1:5)),stringsAsFactors=FALSE) > names(d1) <- c("ID", "Month") > d1[,"Month"] <- as.integer(d1[,"Month"]) > > #Quarterly data, i.e. the value of every quarter has to be matched to > several months in d1 > #However, I want to match fiscal quarters, which means that one > quarter is not necessarily 3 month long > d2 <- data.frame(cbind(c("A","A","B","B"),c(1,3,1,4),c(2,5,3,5),c("v1","v2","v3","V4")),stringsAsFactors=FALSE) > names(d2) <- c("ID", "Min_Month", "Max_Month","Value") > d2[,"Min_Month"] <- as.integer(d2[,"Min_Month"]) > d2[,"Max_Month"] <- as.integer(d2[,"Max_Month"]) > > #Possible solution: Loop through every company and month and check in > which quarter (based on Min_Month and Max_Month) the current month is > for (i in 1:nrow(d1)){ > for (j in 1:nrow(d2)){ > if ((d1[i,"ID"] == d2[j,"ID"]) & (d1[i,"Month"]>= d2[j,"Min_Month"]) & > (d1[i,"Month"]<= d2[j,"Max_Month"])) {break} > } > d1[i,"Value"] <- d2[j,"Value"] > } > > The solution works, but I was hoping that there would be a more > efficient and elegant one. I checked "match" and "merge", but didn't > figure out how I could use those to match without a unique identifier. > > Any hints would be highly appreciated. > > Christoph > > -- > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Christoph Jäckel (Dipl.-Kfm.) > -------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Research Assistant > Chair for Financial Management and Capital Markets | Lehrstuhls für > Finanzmanagement und Kapitalmärkte > TUM School of Management | Technische Universität München ______________________________________________R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Apr 2011 - 15:42:52 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 19 Apr 2011 - 17:30:32 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*