Re: [R] (Nothing to do with) merge problem... extra lines appear in the presence of NAs

From: Sean O'Riordain <seanpor_at_acm.org>
Date: Sat 20 May 2006 - 23:04:45 EST

Apologies all! Thank you Brian
Sean

On 20/05/06, Prof Brian Ripley <ripley@stats.ox.ac.uk> wrote:
> I think you forgot to read over your own message before sending it: take a
> look at a1 which has FOUR rows with mdate == 2005-06-09. Those correspond
> to rows to 9:12 in the result, as you are merging on 'mdate'.
>
> You example is not reproducible, of course, since you used random values.
> Perhaps you intended
>
> a1[floor(runif(nacount)*count), "value"] <- NA
>
>
> On Sat, 20 May 2006, Sean O'Riordain wrote:
>
> > Good morning!
>
> [Or afternoon in Europe, ....]
>
> > I've searched the docs etc... Am I doing something wrong or is this a bug?
> >
> > I'm doing a merge of two dataframes and getting extra rows in the
> > resulting dataframe - the dataframes being merged might have NAs...
> >
> > count <- 10
> > nacount <- 3
> > a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> > names(a1) <- "mdate"
> > a1$value <- runif(count)
> > a1[floor(runif(nacount)*count),]$value <- NA
> >
> > a2 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> > names(a2) <- "mdate"
> > a2$value2 <- runif(count)
> > #a2[floor(runif(nacount)*count),]$value2 <- NA
> >
> >> a1
> > mdate value
> > 1 2005-06-09 NA
> > 2 2005-06-02 0.5287683
> > 3 2005-06-03 0.7563833
> > 4 2005-06-09 NA
> > 5 2005-06-05 0.1027646
> > 6 2005-06-06 0.7775884
> > 7 2005-06-07 0.2993592
> > 8 2005-06-09 NA
> > 9 2005-06-09 0.7434682
> > 10 2005-06-10 0.2096477
> >> a2
> > mdate value2
> > 1 2005-06-01 0.5347852
> > 2 2005-06-02 0.9322765
> > 3 2005-06-03 0.9106499
> > 4 2005-06-04 0.6810564
> > 5 2005-06-05 0.5871867
> > 6 2005-06-06 0.8123808
> > 7 2005-06-07 0.9675379
> > 8 2005-06-08 0.9470369
> > 9 2005-06-09 0.7493767
> > 10 2005-06-10 0.8864103
> >> atot <- merge(a1,a2,all=T)
> >
> > However, I find the following results to be quite un-intuitive - are
> > they correct? May I draw your attention to lines 9:12... Should
> > lines 9:11 be there?
> >
> >> atot
> > mdate value value2
> > 1 2005-06-01 NA 0.5347852
> > 2 2005-06-02 0.5287683 0.9322765
> > 3 2005-06-03 0.7563833 0.9106499
> > 4 2005-06-04 NA 0.6810564
> > 5 2005-06-05 0.1027646 0.5871867
> > 6 2005-06-06 0.7775884 0.8123808
> > 7 2005-06-07 0.2993592 0.9675379
> > 8 2005-06-08 NA 0.9470369
> > 9 2005-06-09 NA 0.7493767
> > 10 2005-06-09 NA 0.7493767
> > 11 2005-06-09 NA 0.7493767
> > 12 2005-06-09 0.7434682 0.7493767
> > 13 2005-06-10 0.2096477 0.8864103
> >
> > Note with no NAs, it works perfectly and as expected...
> >> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> >> names(a1) <- "mdate"
> >> a1$value <- runif(count)
> >> #a1[floor(runif(nacount)*count),]$value <- NA
> >>
> >> atot <- merge(a1,a2,all=T)
> >>
> >> atot
> > mdate value value2
> > 1 2005-06-01 0.35002519 0.5347852
> > 2 2005-06-02 0.76318940 0.9322765
> > 3 2005-06-03 0.32759570 0.9106499
> > 4 2005-06-04 0.47218729 0.6810564
> > 5 2005-06-05 0.74435374 0.5871867
> > 6 2005-06-06 0.81415290 0.8123808
> > 7 2005-06-07 0.04774783 0.9675379
> > 8 2005-06-08 0.21799101 0.9470369
> > 9 2005-06-09 0.99472758 0.7493767
> > 10 2005-06-10 0.41974293 0.8864103
> >
> > R started in each case with --vanilla
> > _
> > platform i386-pc-mingw32
> > arch i386
> > os mingw32
> > system i386, mingw32
> > status Patched
> > major 2
> > minor 3.0
> > year 2006
> > month 05
> > day 11
> > svn rev 38037
> > language R
> > version.string Version 2.3.0 Patched (2006-05-11 r38037)
> >
> > win-xp-pro sp2 - binary installs from CRAN
> >
> >
> > it works in a similar way if I say
> > atot <- merge(a1,a2,by.x="mdate",by.y="mdate",all=T)
> > or even
> > atot <- merge(a1,a2,by="mdate",all=T)
> >
> > also tested on versions 2.2.1, 2.3.0
> >
> > cheers,
> > Sean O'Riordain
> >
> > (ps. ctrl-v paste wouldn't work on 2.4.0-dev downloaded this morning -
> > didn't try very hard though)
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
> --
> Brian D. Ripley, ripley@stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat May 20 23:10:06 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 21 May 2006 - 00:10:15 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.