Re: [R] Most often pairs of chars across grouping variable

From: <svga_at_arcor.de>
Date: Wed, 30 Jul 2008 16:28:21 +0200 (CEST)

Hi Marc,

many thanks, that is exactly what I was looking for.

Best, Sven

> on 07/29/2008 09:51 AM svga@arcor.de wrote:
> > Hi list,
> >
> > is there a package or function to compute the frequencies of pairs of
> > chars in a variable across a grouping variable? Eg:
> >
> >
> > d <- data.frame(ID=gl(2,3), F=c("A","B","C","A","C","D"))
> >> d
> > ID F 1 1 A 2 1 B 3 1 C 4 2 A 5 2 C 6 2 D
> >
> >
> > Now I want to summarize the frequencies of all pairs A-B, A-C, A-D,
> > B-C, B-D, C-D across ID:
> >
> > A B C D A - 1 2 1 B - - 1 0 C - - - 1
> >
> >
> > here, the combination A-C is most frequent. The real problem behind
> > that is that 'F' codes diagnoses and I search for the most often
> > pairs of diagnoses.
> >
> > Thanks, Sven
>
> I suspect that there might be something over in Bioconductor, but here
> is one approach:
>
> > table(data.frame(t(do.call(cbind,
> tapply(d$F, d$ID,
> function(x) combn(as.character(x), 2))))))
> X2
> X1 B C D
> A 1 2 1
> B 0 1 0
> C 0 0 1
>
>
> See ?combn to create the initial pairs from the data. This is done on a
> per ID basis using tapply. The result is transposed into a data frame
> and then table() is used to create the cross tabulation of the results.
>
> HTH,
>
> Marc Schwartz
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 30 Jul 2008 - 14:38:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jul 2008 - 15:03:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive