[R] Exploratory multivariate analysis of categorical data

From: Tim Finney <tjf2n_at_virginia.edu>
Date: Thu 11 Jan 2007 - 19:53:49 GMT

This is my first post to R-help. I am doing some research into the text of the New Testament, specifically places where textual variation occurs across manuscripts. (See http://purl.org/tfinney/NTText/book/index.html for details.)

New Testament textual critics call places where the text varies "variation units," and each state of the text in a variation unit is called a "reading." The apparatus of a critical edition can be transformed into a data matrix by making each witness (typically a manuscript, but might be an early version or church father) an observation (i.e. a row) and each variation unit a variable (i.e. a column). I encode readings, which consist of words or phrases, as numerals in the data matrix. (There are often more than two readings in a variation unit.) I make a dissimilarity matrix by calculating the proportion of variation units in which each pair of witnesses disagrees.

Here is my question: Which exploratory multivariate techniques are applicable to this kind of data matrix and this kind of dissimilarity matrix? From reading the R docs, it seems to me that MDS (metric and non-metric) and hierarchical clustering are appropriate, but I am not so sure about others.


Tim Finney

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 12 07:01:24 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 11 Jan 2007 - 21:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.