From: <marquis2_at_etu.unige.ch>

Date: Thu 21 Apr 2005 - 20:01:12 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Apr 21 20:17:00 2005

Date: Thu 21 Apr 2005 - 20:01:12 EST

hi!

this is a question about lda (MASS) in R on a particular dataset. I'm not a specialist about any of this but: First with the well-known "iris" dataset, I tried using lda to discriminate versicolor from the other to classes and I got approx. 70% of accuracy testing on train set. In iris, versicolor stands "between" the 2 other so one can expect lda not to perform well since it cannot cluser the negative instances (seposa+virginica) together (Is this correct?) (KNN=96% in xval.)

Now, I use my "real" dataset (900 instances, 21 attributes), which 2 classes
can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5
and the like.

So I was very surprised to see that lda also gets an accuracy of 80% on it,
because lda is very simple (finding the best line -- for a 2 classes
problem -- and using projections on the line for classification.)

So my question is: how does lda (in MASS) use the projections to make the decision? Usually the decision for a test instances is made using means and variances of the 2 classes but there are other possibilites (especially in higher dimensions.)

Thanks for any idea, the doc is a bit spares and Venebles&Ripley's book also for this particular matter.

Samuel

PS: and does anybody know how to use the CV option of lda to make xval? I can't get it.

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Apr 21 20:17:00 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:20 EST
*