Re: [R] Essay identification

From: Berton Gunter <gunter.berton_at_gene.com>
Date: Mon 13 Jun 2005 - 07:43:58 EST


I assume that you know the usual procedure is to 'score' each essay by a vector that gives the frequency of occurrence of commonly used (sometimes adding subject matter specific) words and phrases. This multivariate response is then fed in as a "training set" into your favorite supervised learning/classification procedure. R has many of these -- trees, logisic regression, boosting, Random Forests,svm's,LDA,SOM's (whoops -- that's an Unsupervised one), ... . Try
RSiteSearch('Classification',restrict=('functions').

The devil is in the details as to what works best, I believe. With only 78 exemplars in 10 groups, unless there is a lot of separation (disparate styles that you could probably detect manually) it may be difficult. It also depends on how large each group is (balance is generally better).

Cheers,
Bert

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Werner Bier Sent: Sunday, June 12, 2005 12:30 PM

To: r-help@stat.math.ethz.ch
Subject: [R] Essay identification

Hi R-help,  

I have a database of 10 students who have written an overall of 78 essays. The challenge? I would like to identify who wrote the 79th essay.  

Has anybody used R in this context?  

Even if not, would you suggest me which pattern recognition technique I might possibly apply?  

Thanks a lot and regards,
Tom                 


        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jun 13 07:51:58 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:33 EST