From: AndyZon <andyzon_at_gmail.com>

Date: Fri, 11 Jan 2008 15:57:10 -0800 (PST)

*>
*

*>
*

> Yes, and if you are on a multicore system with multithreaded linear

*> algebra, crossprod() will distribute the job across the cores making the
*

*> elapsed time shorter (by almost half on my Core 2 Duo MacBook as long as I
*

*> have nothing else gobbling up CPU cycles)!
*

*>
*

*>
*

*>
*

> If I followed you here, and you have ONLY those three categories, then

*> yes.
*

*>
*

*> Try a test case with perhaps 3 SNPs and a few subjects. Table the results
*

*> the old fashioned way via table() or xtabs() or even by hand. Then look at
*

*> what crossprod( test.case } gives you.
*

*>
*

*> ---
*

*>
*

*> If '11' shows up you'll have to use a 'contr.treatment' style
*

*> approach. ( run 'example( contrasts )' and look at what is going on).
*

*>
*

*> Guess what these give:
*

*>
*

*> contrasts( factor( c( "00","01","10" ) ) )
*

*> contrasts( factor( c( "00","01","10","11" ) ) )
*

*>
*

*> then run them if you have trouble seeing why '11' changes the picture.
*

*>
*

*> ---
*

*>
*

*> BTW, what I said (below) suggests that crossprod() returns integer
*

*> values, but its storage.mode is actually "double".
*

*>
*

*> HTH,
*

*>
*

*> Chuck
*

*>
*

*> I am confused
*

*>
*

*>
*

*>
*

*>
*

*> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
*

*>
*

*> ______________________________________________
*

*> R-help_at_r-project.org mailing list
*

*> https://stat.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide
*

*> http://www.R-project.org/posting-guide.html
*

*> and provide commented, minimal, self-contained, reproducible code.
*

*>
*

*>
*

Date: Fri, 11 Jan 2008 15:57:10 -0800 (PST)

Charles C. Berry wrote:

*>
*

> On Thu, 10 Jan 2008, AndyZon wrote:

*>
*

>> >> Thank you so much, Chuck! >> >> This is brilliant, I just tried some dichotomous variables, it was really >> fast.

> Yes, and if you are on a multicore system with multithreaded linear

>> >> Most categorical variables I am interested in are 3 levels, they are >> actually SNPs, I want to look at their interactions. My question is: >> after >> generating 0-1 codings, like 00, 01, 10, how should I use "crossprod()"? >> Should I just apply this function on these 2*n columns (originally I have >> n >> variables), and then operate on the generated cell counts?

> If I followed you here, and you have ONLY those three categories, then

>> about this. >> >> Your input will be greatly appreciated. >> >> Andy

>> >> >> >> Charles C. Berry wrote: >>> >>> On Wed, 9 Jan 2008, AndyZon wrote: >>> >>>> >>>> Hi, >>>> >>>> I have a huge number of categorical variables, say at least 10000, and >>>> I >>>> put >>>> them into a matrix, each column is one variable. The question is: how >>>> can >>>> I >>>> make all of the pairwise cross tabulation tables efficiently? The >>>> straightforward solution is to use for-loops by looping two indexes on >>>> the >>>> table() function, but it was just too slow. Is there a more efficient >>>> way >>>> to >>>> do that? Any guidance will be greatly appreciated. >>> >>> The totals are merely the crossproducts of a suitably constructed binary >>> (zero-one) matrix is used to encode the categories. See >>> '?contr.treatment' >>> if you cannot grok 'suitably constructed'. >>> >>> If the categories are all dichotomies coded as 0:1, you can use >>> >>> res <- crossprod( dat ) >>> >>> to find the totals for the (1,1) cells >>> >>> If you need the full tables, you can get them from the marginal totals >>> using >>> >>> diag( res ) >>> >>> to get the number in each '1' margin and >>> >>> nrow(dat) >>> >>> to get the table total from which the numbers in each '0' margin by >>> subtracting the corresponding '1' margin. >>> >>> With dichotomous variables, dat has 10000 columns and you will only need >>> 10000^2 integers or about 0.75 Gigabytes to store the 'res'. And it >>> takes >>> about 20 seconds to run 1000 rows on my MacBook. Of course, 'res' has a >>> redundant triangle >>> >>> This approach generalizes to any number of categories: >>> >>> To extend this to more than two categories, you will need to do for each >>> such column what model.matrix(~factor( dat[,i] ) ) does by default >>> ( using 'contr.treatment' ) - construct zero-one codes for all but one >>> (reference) category. >>> >>> Note that with 10000 trichotomies, you will have a result with >>> >>> 10000^2 * ( 3-1 )^2 >>> >>> integers needing about 3 Gigabytes, and so on. >>> >>> HTH, >>> >>> Chuck >>> >>> p.s. Why on Earth are you doing this???? >>> >>> >>>> >>>> Andy >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/pairwise-cross-tabulation-tables-tp14723520p14723520.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help_at_r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> Charles C. Berry (858) 534-2098 >>> Dept of Family/Preventive >>> Medicine >>> E mailto:cberry_at_tajo.ucsd.edu UC San Diego >>> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego >>> 92093-0901 >>> >>> ______________________________________________ >>> R-help_at_r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/pairwise-cross-tabulation-tables-tp14723520p14744086.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help_at_r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>

>

> Charles C. Berry (858) 534-2098

> Dept of Family/Preventive> Medicine> E mailto:cberry_at_tajo.ucsd.edu UC San Diego

-- View this message in context: http://www.nabble.com/pairwise-cross-tabulation-tables-tp14723520p14767832.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Sat 12 Jan 2008 - 00:01:15 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 12 Jan 2008 - 19:30:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*