Re: [R] linkage disequilibrium

From: David Duffy <David.Duffy_at_qimr.edu.au>
Date: Mon 08 Aug 2005 - 22:30:55 EST

> Date: Thu, 4 Aug 2005 19:36:35 +0200
> From: Cristian <cristian@biometria.univr.it>
> Subject: [R] linkage disequilibrium
> To: r-help@stat.math.ethz.ch
> Message-ID: <1123176995.42f252238a47a@biometria.univr.it>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I'm using the package "Genetics", and I'm interested in the computation of D'
> statistics for Linkage Disequilibrium, for which the LD() command has been
> realised. Unfortunately I don't find any reference on "how" the D' is computed
> by the LD() function. In the package documentation it is generally referred as
> "MLE" estimation, but references are not provided. Does anybody knows how it is
> obtained or, at least, some references?
>
> Are there any other R package performing the D' computation both for phased and
> unphased genotype?
>
> Thanks! Cristian
>

You need to look at the code:
getAnywhere("LD.genotype")

See any standard reference such as Bruce Weir's _Genetic Data Analysis_ (Sinauer Associates) or Pak Sham's book on statistical genetics for the background to the algorithm.

The chi-square testing D=0 from LD() is twice what it should be, and you may be confused (I know I was) by the fact that the marginal allele frequencies are estimated using non-missing data for each locus in turn. This means the bounds (pmin and pmax) for the AB haplotype frequency are different from that in the actual table used to maximize the likelihood. So, you will get different answers from programs using jointly complete observations only.

Several other packages for haplotype analysis are on CRAN. Package haplo.stats has the haplo.em() function to give the MLEs for the haplotype frequencies. From these you can easily calculate D etc. Package hwde estimates nonstandard disequilibrium coefficients in a loglinear framework, and can be used to compare different sample disequilibria. Note that haplo.stats and hapassoc are aimed specifically at comparing groups or testing for association to other traits. My package gllm is not as easy to use but can combine phased and unphased data in loglinear models -- you could probably use cat in the same way.

David Duffy.

| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD@qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Aug 08 22:43:39 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:09:27 EST