Re: [R] merge, cbind, or....?

From: Marc Schwartz <MSchwartz_at_medanalytics.com>
Date: Sat 24 Jul 2004 - 01:36:13 EST

On Fri, 2004-07-23 at 10:07, Bruno Cutayar wrote:
> Hi,
> i have two data.frame x and y like :
> > x <- data.frame( num = c(1:10), value = runif(10) )
> > y <- data.frame( num = c(6:10), value = runif(5) )
> and i want to obtain something like :
>
> num.x value.x num.y value.y
> 1 0.38423828 NA 0.2911089
> 2 0.17402507 NA 0.8455208
> 3 0.54443465 NA 0.8782199
> 4 0.04540406 NA 0.3202252
> 5 0.46052426 NA 0.7560559
> 6 0.61385464 6 0.2911089
> 7 0.48274968 7 0.8455208
> 8 0.11961778 8 0.8782199
> 9 0.64531394 9 0.3202252
> 10 0.92052805 10 0.7560559
>
> with NA in case of missing value for y to x.
>
> { for this example : i write simply
> > data.frame(num.x=c(1:10),
> value.x=x[[2]],num.y=c(rep(NA,5),6:10),value.y=y[[2]]) }
>
> I didn't find solution in merge(x,y,by="num") : missing rows are no keeping.
> Can't you help me ?
>
> thanks,
> Bruno

The use of merge(), with the argument 'all' set to TRUE, will get you the following (note my values are different due to not using the same 'seed' value for runif() ):

> merge(x, y, by = "num", all = TRUE)

   num value.x value.y

1    1 0.14057955        NA
2    2 0.60850644        NA
3    3 0.63410731        NA
4    4 0.07196253        NA
5    5 0.51869503        NA
6    6 0.57042428 0.3340535
7    7 0.85874426 0.9340489
8    8 0.03608417 0.5417780
9    9 0.24422205 0.2214993
10  10 0.03383263 0.4947865

The use of 'all = TRUE' will fill in non-matching rows. The default is FALSE. Note here however, that the value.y column is not replicated for the first five rows, as you have above. If that is what you want, you could do something like the following:

> cbind(x, y$value)

   num value y$value

1    1 0.14057955 0.3340535
2    2 0.60850644 0.9340489
3    3 0.63410731 0.5417780
4    4 0.07196253 0.2214993
5    5 0.51869503 0.4947865
6    6 0.57042428 0.3340535
7    7 0.85874426 0.9340489
8    8 0.03608417 0.5417780
9    9 0.24422205 0.2214993
10  10 0.03383263 0.4947865

which takes advantage of the recycling of y$value, since it is shorter than the number of rows in 'x'. In this case, y$value is repeated twice. HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 24 01:45:38 2004

This archive was generated by hypermail 2.1.8 : Wed 03 Nov 2004 - 22:55:12 EST