Re: [R] Essay identification

From: Berton Gunter <>
Date: Mon 13 Jun 2005 - 07:43:58 EST

I assume that you know the usual procedure is to 'score' each essay by a vector that gives the frequency of occurrence of commonly used (sometimes adding subject matter specific) words and phrases. This multivariate response is then fed in as a "training set" into your favorite supervised learning/classification procedure. R has many of these -- trees, logisic regression, boosting, Random Forests,svm's,LDA,SOM's (whoops -- that's an Unsupervised one), ... . Try

The devil is in the details as to what works best, I believe. With only 78 exemplars in 10 groups, unless there is a lot of separation (disparate styles that you could probably detect manually) it may be difficult. It also depends on how large each group is (balance is generally better).


-----Original Message-----
[] On Behalf Of Werner Bier Sent: Sunday, June 12, 2005 12:30 PM

Subject: [R] Essay identification

Hi R-help,  

I have a database of 10 students who have written an overall of 78 essays. The challenge? I would like to identify who wrote the 79th essay.  

Has anybody used R in this context?  

Even if not, would you suggest me which pattern recognition technique I might possibly apply?  

Thanks a lot and regards,

        [[alternative HTML version deleted]] mailing list PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Mon Jun 13 07:51:58 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:33 EST