Re: [R] Essay identification

From: Greg Snow <>
Date: Tue 14 Jun 2005 - 02:02:25 EST

This topic is sometimes called wordprinting or stylometry. The spring 2003 issue of Chance magazine had several articles on the topic.

A colleague of mine and I have been working on a perl program (along with various graduate students) to extract many of the common statistics used in wordprinting (counts/percentages of non-contextual words, word pattern ratios, vocabulary richness). The data can then be loaded into R (or any other stats package) to be analyzed.

The program is currently in a beta state (usable, but we want to possibly add more features and documentation), but I can send a copy to anyone who is interested (specify if you have perl, or need a stand alone copy (windows only)).

hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
(801) 408-8111

>>> Werner Bier <> 06/12/05 01:29PM >>>
Hi R-help,  

I have a database of 10 students who have written an overall of 78 essays.
The challenge? I would like to identify who wrote the 79th essay.  

Has anybody used R in this context?  

Even if not, would you suggest me which pattern recognition technique I might possibly apply?  

Thanks a lot and regards,

        [[alternative HTML version deleted]] mailing list PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Tue Jun 14 02:11:45 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:33 EST