Date: Sun 25 Jun 2006 - 19:41:52 EST

"Gary Collins" <collins.gs@gmail.com> writes:

> looking at the help page/code in STATA for tetrachoric, it says it

*> estimates the tetrachoric correlation via the approximation suggested
**> by Edwards & Edwards (1984), "Approximating the tetrachoric
**> correlation", Biometrics, 40(2): 563.
**> that is,
**> (alpha (pi/4) - 1) / (alpha^(pi/4)+1), where alpha is ad/bc
**> i.e.
**> > alpha=(522 * 22)/(34 * 54)
**> > (alpha^(pi/4)-1) / (alpha^(pi/4)+1)
**> [1] 0.6168851
...and the approximation is obviously quite far off the mark in this case. Presumably (I'm lazy) the approximation holds for the odds ratio alpha close to 1 (rho close to 0) and/or marginal distributions close to 50:50.

There's a Stata package "polychoric" which claims to do things more accurately, referred to at

http://www.ats.ucla.edu/STAT/stata/faq/tetrac.htm

(I believe I mentioned this before, but possibly in a private mail to Janet which never reached r-help).

**> Gary
> On 25/06/06, John Fox <jfox@mcmaster.ca> wrote:

*> > Dear Janet,
**> > A good thing to do when different software gives different answers is
**> > to check each against known results. I'm away from home, and don't have
**> > all of the examples that I used to check polychor(), but I dug up the
**> > following. The polychor() function produces output that agrees with
**> > both of these sources. How does Stata do?
**> > > # example from Drasgow (1988), pp. 69-74 in Kotz and Johnson,
**> > > # Encyclopedia of statistical sciences. Vol. 7.
**> > > tab
**> > [,1] [,2] [,3]
**> > [1,] 58 52 1
**> > [2,] 26 58 3
**> > [3,] 8 12 9
**> > > polychor(tab, std.err=TRUE)
**> > Polychoric Correlation, 2-step est. = 0.42 (0.07474)
**> > Test of bivariate normality: Chisquare = 11.55, df = 3, p = 0.009078
**> > > polychor(tab, ML=TRUE, std.err=TRUE)
**> > Polychoric Correlation, ML est. = 0.4191 (0.07616)
**> > Test of bivariate normality: Chisquare = 11.54, df = 3, p = 0.009157
**> > Row Thresholds
**> > Threshold Std.Err.
**> > 1 -0.02988 0.08299
**> > 2 1.13300 0.10630
**> > Column Thresholds
**> > Threshold Std.Err.
**> > 1 -0.2422 0.08361
**> > 2 1.5940 0.13720
**> > > tab # example from Brown (1977) Applied Statistics, 26:343-351.
**> > [,1] [,2]
**> > [1,] 1562 42
**> > [2,] 383 94
**> >
**> > > polychor(tab)
**> > [1] 0.595824
**> > Regards,
**> > John
**> > On Fri, 23 Jun 2006 14:33:31 -0700
**> > Janet Rosenbaum <jrosenba@rand.org> wrote:
**> > > Peter --- Thanks for pointing out the omitted information. The
**> > > hazards
**> > > of attempting to be brief.
**> > > In R, I am using polychor(vec1, vec2, std.err=T) and have used both
**> > > the
**> > > ML and 2 step estimates, which give virtually identical answers. I
**> > > am
**> > > explicitly using only the 632 complete cases in R to make sure
**> > > missing
**> > > data is handled the same way as in stata.
**> > > Here's my data:
**> > >
**> > > 522 54
**> > > 34 22
**> > >
**> > > > polychor(v1, v2, std.err=T, ML=T)
**> > > Polychoric Correlation, ML est. = 0.5172 (0.08048)
**> > > Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
**> > > Row Thresholds
**> > > Threshold Std.Err.
**> > > 1 1.349 0.07042
**> > > Column Thresholds
**> > > Threshold Std.Err.
**> > > 1 1.174 0.06458
**> > > Warning message:
**> > > NaNs produced in: pchisq(q, df, lower.tail, log.p)
**> > > In stata, I get:
**> > >
**> > > . tetrachoric t1_v19a ct1_ix17
**> > >
**> > > Tetrachoric correlations (N=632)
**> > >
**> > > ----------------------------------
**> > > Variable | t1_v19a ct1_ix17
**> > > -------------+--------------------
**> > > t1_v19a | 1
**> > > ct1_ix17 | .6169 1
**> > > ----------------------------------
**> > >
**> > > Janet
**> > > Peter Dalgaard wrote:
**> > > > Janet Rosenbaum <jrosenba@rand.org> writes:
**> > > >
**> > > >> I hope someone here knows the answer to this since it will save me
**> > > from
**> > > >> delving deep into documentation.
**> > > >>
**> > > >> Based on 22 pairs of vectors, I have noticed that tetrachoric
**> > > >> correlation coefficients in stata are almost uniformly higher than
**> > > those
**> > > >> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51
**> > > in
**> > > >> stata, .39 in R). Stata's estimate is higher than R's in 20 out
**> > > of 22
**> > > >> computations, although the estimates always fall within the 95% CI
**> > > for
**> > > >> the TCC calculated by R.
**> > > >>
**> > > >> Do stata and R calculate TCC in dramatically different ways? Is
**> > > the
**> > > >> handling of missing data perhaps different? Any thoughts?
**> > > >>
**> > > >> Btw, I am sending this question only to the R-help list.
**> > > >
**> > > >
**> > > > A bit more information seems necessary:
**> > > >
**> > > > - tetrachoric correlations depend on 4 numbers, so you should be
**> > > able
**> > > > to give a direct example
**> > > >
**> > > > - you're not telling us how you calculate the TCC in R. This is not
**> > > > obvious (package polycor?).
**> > > >
**> > > --------------------
**> > >
This email message is for the sole use of the intended
-- O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907

