Re: [R] Matching Problem: Want to match to data.frame with inexact matching identifier (one identifier has to be in the range of the other).

From: Christoph Jäckel <christoph.jaeckel_at_wi.tum.de>
Date: Tue, 19 Apr 2011 17:40:17 +0200

Hi together,

I found a solution to my problem that works for me and performs reasonable well in my opinion. Instead of looping through both datasets, I decided to replicate each row in d2 from Min_Month to Max_Month and then use the match-function. Here is the code (I changed the original example to be able to compare the performance of the two solutions):

#Example for performance test
nobs <- 100  #Observation per company [26 companies] d1 <- data.frame(cbind("ID"=rep(letters,each=nobs,1),"Month"=rep(1:nobs,26)),stringsAsFactors=FALSE) d1$Month <- as.integer(d1$Month)
d1[,"Unique_ID"] <- mapply(paste,d1$ID,d1$Month,sep="_")

d2 <- data.frame(cbind("ID"=rep(letters,each=3,1),"Min_Month"=rep(c(1,as.integer(nobs/3),as.integer(2*nobs/3)),26),
      "Max_Month"=rep(c(as.integer(nobs/3)-1,as.integer(2*nobs/3)-1,nobs),26),"Value1"=rep(1:3,26),
"Value2"=rep(4:6,26)),stringsAsFactors=FALSE) d2$Min_Month <- as.integer(d2$Min_Month) d2$Max_Month <- as.integer(d2$Max_Month)

#Possible solution; is there are more elegant one? proc1 <- proc.time()
for (i in 1:nrow(d1)){
  for (j in 1:nrow(d2)){
    if ((d1[i,"ID"] == d2[j,"ID"]) & (d1[i,"Month"]>= d2[j,"Min_Month"]) &
    (d1[i,"Month"]<= d2[j,"Max_Month"])) {break}
  }
  d1[i,"Value1"] <- d2[j,"Value1"]
  d1[i,"Value2"] <- d2[j,"Value2"]
}
proc1 <- proc.time() - proc1

#New Solution; Create rows first, then match it proc2 <- proc.time()
createRows <- function(df){
  dates <- seq(df$Min_Month,df$Max_Month,by=1)
  rows <- data.frame(ID = df$ID,date=dates, df[,c("Value1","Value2")])
  rows

}
library(plyr)
d2 <- ddply(d2,.(ID,Min_Month),createRows)

d2[,"Unique_ID"] <- mapply(paste,d2$ID,d2$date,sep="_")
d1[,c("Value1_Alt", "Value2_Alt")] <-
d2[match(d1$Unique_ID,d2$Unique_ID),c("Value1","Value2")]
proc2 <- proc.time() - proc2

Hope that helps. If there is still a better solution, I would still be more than happy for an advice!

Christoph

On Sat, Apr 16, 2011 at 4:35 PM, Christoph Jäckel <christoph.jaeckel_at_wi.tum.de> wrote:

>
> Hello R-Community,
>
> I have the following matching problem: I have two data.frames, one
> with an observation every month (per company ID), and one with an
> observation every quarter (per company ID; note that quarter means
> fiscal quarter; therefore 1Q = Jan, Feb, Mar is not necessarily
> correct and also, a fiscal quarter is not necessarily 3 month long).
>
> For every month and company, I want to get the correct value of that
> quarter. Consequently, several months have the same value for one
> quarter. As an example see the code below:
>
> #Monthly data
> d1 <- data.frame(cbind(c(rep("A",5),rep("B",5)),c(1:5,1:5)),stringsAsFactors=FALSE)
> names(d1) <- c("ID", "Month")
> d1[,"Month"] <- as.integer(d1[,"Month"])
>
> #Quarterly data, i.e. the value of every quarter has to be matched to
> several months in d1
> #However, I want to match fiscal quarters, which means that one
> quarter is not necessarily 3 month long
> d2 <- data.frame(cbind(c("A","A","B","B"),c(1,3,1,4),c(2,5,3,5),c("v1","v2","v3","V4")),stringsAsFactors=FALSE)
> names(d2) <- c("ID", "Min_Month", "Max_Month","Value")
> d2[,"Min_Month"] <- as.integer(d2[,"Min_Month"])
> d2[,"Max_Month"] <- as.integer(d2[,"Max_Month"])
>
> #Possible solution: Loop through every company and month and check in
> which quarter (based on Min_Month and Max_Month) the current month is
> for (i in 1:nrow(d1)){
>  for (j in 1:nrow(d2)){
>    if ((d1[i,"ID"] == d2[j,"ID"]) & (d1[i,"Month"]>= d2[j,"Min_Month"]) &
>    (d1[i,"Month"]<= d2[j,"Max_Month"])) {break}
>  }
>  d1[i,"Value"] <- d2[j,"Value"]
> }
>
> The solution works, but I was hoping that there would be a more
> efficient and elegant one. I checked "match" and "merge", but didn't
> figure out how I could use those to match without a unique identifier.
>
> Any hints would be highly appreciated.
>
> Christoph
>
> --
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Christoph Jäckel (Dipl.-Kfm.)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Research Assistant
> Chair for Financial Management and Capital Markets | Lehrstuhls für
> Finanzmanagement und Kapitalmärkte
> TUM School of Management | Technische Universität München

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 19 Apr 2011 - 15:42:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 19 Apr 2011 - 17:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive