From: <noxyport_at_gmail.com>

Date: Tue, 10 May 2011 15:52:02 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 10 May 2011 - 14:11:12 GMT

Date: Tue, 10 May 2011 15:52:02 +0200

On Tue, May 10, 2011 at 3:09 PM, David Winsemius <dwinsemius_at_comcast.net>wrote:

*>
*

> On May 10, 2011, at 3:18 AM, noxyport@gmail.com wrote:

*>
**> On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsemius_at_comcast.net>
**>> wrote:
**>>
**>>>
**>>> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
**>>>
**>>>
**>>>> Gabor Grothendieck wrote:
**>>>>
**>>>>>
**>>>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport_at_gmail.com>
**>>>>> wrote:
**>>>>>
**>>>>>>
**>>>>>> Hi,
**>>>>>> consider the following two dataframes:
**>>>>>> x1=c("232","3454","3455","342","13")
**>>>>>> x2=c("1","1","1","0","0")
**>>>>>> data1=data.frame(x1,x2)
**>>>>>>
**>>>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
**>>>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
**>>>>>> data2=data.frame(y1,y2)
**>>>>>>
**>>>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
**>>>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
**>>>>>> result
**>>>>>> of data1 should look like this:
**>>>>>> x1 x2 x3
**>>>>>> 1 232 1 1
**>>>>>> 2 3454 1 1
**>>>>>> 3 3455 1 0
**>>>>>> 4 342 0 0
**>>>>>> 5 13 0 1
**>>>>>>
**>>>>>> I think a SQL command could help me but I am too inexperienced with it
**>>>>>> to
**>>>>>> get there.
**>>>>>>
**>>>>>>
**>>>>> Try this:
**>>>>>
**>>>>> library(sqldf)
**>>>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2
**>>>>>> d2
**>>>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
**>>>>>>
**>>>>>
**>>>>> x1 x2 x3
**>>>>> 1 232 1 1
**>>>>> 2 3454 1 1
**>>>>> 3 3455 1 0
**>>>>> 4 342 0 0
**>>>>> 5 13 0 1
**>>>>>
**>>>>>
**>>>>> snipped Gabor's sig
**>>>
**>>>>
**>>>> That works pretty cool but I need to automate this a bit more. Consider
**>>>> the
**>>>> following example:
**>>>>
**>>>> list1=c("A01","B04","A64","G84","F19")
**>>>>
**>>>> x1=c("232","3454","3455","342","13")
**>>>> x2=c("1","1","1","0","0")
**>>>> data1=data.frame(x1,x2)
**>>>>
**>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
**>>>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
**>>>> data2=data.frame(y1,y2)
**>>>>
**>>>> I want now to creat a loop, which creates for every value in list1 a new
**>>>> binary variable in data1. Result should look like:
**>>>> x1 x2 A01 B04 A64 G84 F19
**>>>> 232 1 0 1 0 0 0
**>>>> 3454 1 0 0 1 0 1
**>>>> 3455 1 0 0 0 0 0
**>>>> 342 0 0 0 0 0 0
**>>>> 13 0 1 0 0 1 1
**>>>>
**>>>
**>>> Loops!?! We don't nee no steenking loops!
**>>>
**>>> xtb <- with(data2, table(y1,y2))
**>>>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
**>>>>
**>>> x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
**>>> 232 232 1 0 0 1 1 0 0 0 0 0
**>>> 3454 3454 1 0 1 0 0 0 1 0 0 0
**>>> 3455 3455 1 0 0 0 0 1 0 0 0 0
**>>> 342 342 0 0 0 0 0 0 0 0 0 1
**>>> 13 13 0 1 0 0 0 0 1 1 1 0
**>>>
**>>> I am guessing that you were to ... er, busy? ... to complete the table?
**>>>
**>>> --
**>>>
**>>> David Winsemius, MD
**>>> West Hartford, CT
**>>>
**>>>
**>>>
**>> Thanks a lot! Pretty simple. I am so much used to SQLDF right now.
**>>
**>> So how would you handle more complicated strings like that:
**>> y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
**>> y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
**>> C35","F68","G84","F19","A01")
**>> data2=data.frame(y1,y2)
**>>
**>> Where you want to extract for instance all "A01" from the strings?
**>>
**>
**> I think you need either to explain what you want in more words of the
**> English language or to offer an example of the desired output. I suspect you
**> did not want something as simple as this:
**>
**> > A01.instances <- grep("A01" , data2$y2)
**> > A01.instances
**> [1] 2 11
**> > data2[A01.instances, ]
**> y1 y2
**> 2 232 B04 A01 F19
**> 11 13 A01
**>
**> Or maybe you did?
**>
**> --
**> David Winsemius, MD
**> West Hartford, CT
**>
**>
*

With sqldf I could do it manually:

*> > data1=sqldf("SELECT data1.*, max(data2.y2 LIKE '% A01%') OR max(data2.y2
**> LIKE 'A01%') A01 FROM data1 left join data2 on (data1.x1 = data2.y1) group
**> by data1.x1, data2.y1")
**> > data1=sqldf("SELECT data1.*, max(data2.y2 LIKE '% B04%') OR max(data2.y2
**> LIKE 'B04%') B04 FROM data1 left join data2 on (data1.x1 = data2.y1) group
**> by data1.x1, data2.y1")
**> > data1
**> x1 x2 A01 B04
**> 1 13 0 1 0
**> 2 232 1 1 1
**> 3 342 0 0 0
**> 4 3454 1 0 0
*

> 5 3455 1 0 0

*> >
**>
*

But I need to automate this for some thousand "substrings". Any suggestion?

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 10 May 2011 - 14:11:12 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 10 May 2011 - 15:10:05 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*