Re: [R] Essay identification

From: Greg Snow <greg.snow_at_ihc.com>
Date: Tue 14 Jun 2005 - 02:02:25 EST


This topic is sometimes called wordprinting or stylometry. The spring 2003 issue of Chance magazine had several articles on the topic.

A colleague of mine and I have been working on a perl program (along with various graduate students) to extract many of the common statistics used in wordprinting (counts/percentages of non-contextual words, word pattern ratios, vocabulary richness). The data can then be loaded into R (or any other stats package) to be analyzed.

The program is currently in a beta state (usable, but we want to possibly add more features and documentation), but I can send a copy to anyone who is interested (specify if you have perl, or need a stand alone copy (windows only)).

hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow@ihc.com
(801) 408-8111

>>> Werner Bier <aliscla@yahoo.com> 06/12/05 01:29PM >>>
Hi R-help,  

I have a database of 10 students who have written an overall of 78 essays.
The challenge? I would like to identify who wrote the 79th essay.  

Has anybody used R in this context?  

Even if not, would you suggest me which pattern recognition technique I might possibly apply?  

Thanks a lot and regards,
Tom                 


        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Jun 14 02:11:45 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:33 EST