[R] genotype analysis

From: Anne-Marie Ternes <amternes_at_gmail.com>
Date: Wed, 26 Mar 2008 11:01:26 +0100


Dear mailing list,

I'm still quite a newbie in the statistical analysis of genotype/allele data, resp. more generally in the analysis of categorical variables. Moreover, I'm currently totally confused by the many R packages available to do such analysis.

Here is my case: I've got a list of genes, and a number of case-control population pairs, and for each population and gene, the various genotypes that have been found. I've got both aggregate data (ex. gene1: homozygote wildtype: 201, heterozygote mutation carrier: 34, homozygote mutation carrier: 5) and per-gene data (i.e. for gene1 a list of e.g. "V/V", "V/I", "II" etc).

The question asked is whether there is a difference in the mutation pattern between the case and the control groups influencing the outcome, both at the level of a single gene, and at the level of their combination. Moreover, I would like to check for linkage desequilibrium (LD), as I know that some of these genes are located quite closely on the chromosome.

OK, so up to now I've been doing the Chi-square tests, McNemar matched pairs test, Fisher test if my numbers were too small.

As for the LD question, if I have understood correctly, I have to use log-linear regression. I have been trying several R packages, and I'm so confused now, because I don't know which one is best suited for my problem. I have to add that I'm new also to log-linear regression...

I've used "hwde", and read the paper on which it is based (see hwde doc), but the package leaves out certain output rows that are shown in the paper, and it doesn't show which of the output rows is significant, as the paper does. Is there any simply way to interpret
"hwde" output (something like a p-value)?

Then there are the "GeneticsBase", "Genetics", "mapLD",
"Hardy-Weinberg" packages. Some work only for a single gene, some
apply a thing called "MLE", some "general linearized models", etc.

I know these questions are as much basic statistical than R questions. But I'd be glad if you could help me find the best solution for my type of analysis, resp. point me to good resources that show me how to do this. The problem is that most resources show "how to" do the analysis, but they don't explain at all how to *interpret* their output.

Thanks a lot in advance,

Anne-Marie



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 26 Mar 2008 - 13:16:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 26 Mar 2008 - 13:30:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive