Re: [R] how to merge within range?

From: René Mayer <mayer_at_psychologie.tu-dresden.de>
Date: Sat, 14 May 2011 21:08:12 +0200

sqldf is impressive - compiled it now;
the trick with findInterval is nice, too. thanks guys!!

Zitat von "David Winsemius" <dwinsemius_at_comcast.net>:

>
> On May 14, 2011, at 2:27 PM, William Dunlap wrote:
>
>> You could use findInterval() along with a trick with c(rbind(...)):
>>
>>> i <- findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to)))
>>> i
>> [1] 1 1 1 2 3 3 3 5 5 6
>
> That's nice. I was working on a slightly different "trick"
>
> findInterval( df.1[,1],t(df.2[,1:2]))
> [1] 1 1 1 2 3 3 3 5 5 6
>
> I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2
>
>
>> The even-valued outputs would map to NA's, the odds
>> to value[(i+1)/2], but you can use the c(rbind(...)) trick again:
>>
>>> c(rbind(df.2$value, NA))[i]
>> [1] 1 1 1 NA 3 3 3 5 5 NA
>
> I'd like to understand that. Maybe, maybe... ah, got it. At first I
> didn't realize those were the final answers since they looked like
> indices. My t(.) trick doesn't generalize as well.
>
>
> My earlier suggestion tht two merges woul do it was based on my
> erroneous interpretation of the example, since I thought the task
> was to match on the end points of the intervals.
>
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: r-help-bounces_at_r-project.org
>>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of René Mayer
>>> Sent: Saturday, May 14, 2011 11:06 AM
>>> To: David Winsemius
>>> Cc: r-help_at_r-project.org
>>> Subject: Re: [R] how to merge within range?
>>>
>>> thanks David and Ian,
>>> let me make a better example as the first one was flawed
>>>
>>> df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
>>> names(df.1) = c("time", "value")
>>> df.1
>>> time value
>>> 1 101 NA
>>> 2 199 NA
>>> 3 301 NA
>>> 4 401 NA
>>> 5 501 NA
>>> 6 601 NA
>>> 7 700 NA
>>> 8 800 NA
>>> 9 900 NA
>>> 10 1000 NA
>>>
>>> # from and to define ranges within time,
>>> # note that from and to may not match the numbers given in time
>>> df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
>>> df.2
>>> from to value
>>> 1 99 303 1
>>> 2 500 702 3
>>> 3 799 950 5
>>>
>>> what I want is:
>>> time value
>>> 1 101 1
>>> 2 199 1
>>> 3 301 1
>>> 4 401 NA
>>> 5 501 3
>>> 6 601 3
>>> 7 700 3
>>> 8 800 5
>>> 9 900 5
>>> 10 1000 NA
>>>
>>> @David I don't know what you mean by 2 merges,
>>> René
>>>
>>>
>>>
>>>
>>>
>>> Zitat von "David Winsemius" <dwinsemius_at_comcast.net>:
>>>
>>>>
>>>> On May 14, 2011, at 9:16 AM, Ian Gow wrote:
>>>>
>>>>> If I assume that the third column in data.frame.2 is named
>>> "val" then in
>>>>> SQL terms it _seems_ you want
>>>>>
>>>>> SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN
>>> data.frame.2 AS b ON
>>>>> a.time BETWEEN b.start AND b.end;
>>>>>
>>>>> Not sure how to do that elegantly using R subsetting/merge,
>>>>
>>>> Huh? It's just two merge()'s (... once you fix the error in
>>> the example.)
>>>>
>>>> --
>>>> David
>>>>
>>>>> but you might
>>>>> try a package that allows you to use SQL, such as sqldf.
>>>>>
>>>>>
>>>>> On 5/14/11 8:03 AM, "David Winsemius"
>>> <dwinsemius_at_comcast.net> wrote:
>>>>>
>>>>>>
>>>>>> On May 14, 2011, at 8:12 AM, René Mayer wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> how can one merge
>>>>>>
>>>>>> And what happened when you typed:
>>>>>>
>>>>>> ?merge
>>>>>>
>>>>>>> two data frames when in the second data frame one column
>>> defines the
>>>>>>> start values
>>>>>>> and another defines the end value of the to be merged range.
>>>>>>> data.frame.1
>>>>>>> time ...
>>>>>>> 13
>>>>>>> 24
>>>>>>> 35
>>>>>>> 46
>>>>>>> 55
>>>>>>> ...
>>>>>>> data.frame.2
>>>>>>> start end
>>>>>>> 24 37 ?h? ?
>>>>>>> ...
>>>>>>>
>>>>>>> should result in this
>>>>>>> 13 NA
>>>>>>> 24 ?h?
>>>>>>> 35 ?h?
>>>>>>> 46 NA
>>>>>>> 55
>>>>>>> ?
>>>>>>
>>>>>> And _why_ would that be?
>>>>>>
>>>>>>
>>>>>>> thanks,
>>>>>>> René
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help_at_r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
>>> reproducible code.
>>>>>>
>>>>>> David Winsemius, MD
>>>>>> West Hartford, CT
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help_at_r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> David Winsemius, MD
> West Hartford, CT
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 May 2011 - 19:11:44 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 15 May 2011 - 21:50:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive