Re: [R] merge dataframes with conditions formulated as logical expressions

From: Wolfram Fischer <wolfram_at_fischer-zim.ch>
Date: Thu 15 Jun 2006 - 15:06:09 EST

Your solution without concerning discontinuity is better because it is more general.

> Here is my solution that ignores that (and only uses MIN.VAL and
> completely disrespecting MAX.VAL). Not very elegant but should do
> the trick.
>
>
> df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
> dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1,
> 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
>
> x <- split(df, df$GRP)
> y <- split(dp, dp$GRP)
>
> out <- NULL
> for(g in names(x)){
>
> xx <- x[[g]]
> yy <- y[[g]]
>
> w <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
> tmp <- cbind(xx, yy[w, "VAL2"])
> colnames(tmp) <- c("GRP", "VAL", "VAL2")
> out <- rbind(out, tmp)
> }
> out
>
> Regards, Adai

Thanks for this solution.

I did not yet try to program a conventional solution because I thought there would be a nice shortcut in R to solve the problem comparably elegantly as in SQL:

	select df.*, dp.VAL2
	from df, dp
	where df.GRP = dp.GRP
	  and df.VAL > dp.MIN_VAL
	  and df.VAL <= dp.MAX_VAL

Wolfram

> On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
> > I have a data.frame df containing two variables:
> > GRP: Factor
> > VAL: num
> >
> > I have a data.frame dp containing:
> > GRP: Factor
> > MIN.VAL: num
> > MAX.VAL: num
> > VAL2: num
> > with several rows per "GRP"
> > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
> > within the same "GRP".
> >
> > I want to create df[i, "VAL2"] <- dpp[z, "VAL2"]
> > with i along df
> > and dpp <- subset( dp, GRP = df[i, "GRP"] )
> > so that it is true for each i:
> > df[i, "VAL"] > dpp[z, "MIN.VAL"]
> > and df[i, "VAL"] <= dpp[z, "MAX.VAL"]
> >
> > Is there an easy/efficient way to do that?
> >
> > Example:
> > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
> > dp <- data.frame( GRP=c( "A", "A", "B", "B" ),
> > MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ),
> > VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> >
> > The result should be:
> > df$VAL2 <- c( 1.1, 2.2, 4.4 )
> >
> > Thanks - Wolfram
> >
> > ______________________________________________



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jun 15 15:11:40 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 15 Jun 2006 - 16:12:02 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.