[R] LDA on pre-assigned training and testing data sets

From: Peter Flom <peterf_at_brainscope.com>
Date: Wed, 25 Jun 2008 09:21:46 -0700


Dear r-help

I am trying to run LDA on a training data set, and test it on another data set with the same variables. I found examples using crossvalidation, and using training and testing data sets set up with sample, but not when they are preassigned.

Here is what I tried

# FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES
traintest1 <- arnaudnognod1[arnaudnognod1$DISC_USE1 == 1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04  |arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 == 1.05|arnaudnognod1$DISC_USE1 == 1.06,]

traintest1$normal <- traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.03|traintest1$DISC_USE1 == 1.04
traintest1$mafelev <- apply(traintest1[,1:40], 1, FUN = mean)
traintest1$mafscatter <- apply(traintest1[,1:40], 1, FUN = sd)

# NEXT CREATE TRAINING AND TESTING DATAFRAMES
train <- traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.02,] test <- traintest1[traintest1$DISC_USE1 > 1.02,]

# NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796 ROWS, EACH HAS 615 COLUMNS, AS EXPECTED

# RUN DISCRIM ON TRAINING DATA
mafdisc <- lda(normal~mafelev + mafscatter, data = train)

#mafdisc$counts IS 210 AND 190, AS EXPECTED

#FINALLY, TEST IT ON THE TEST DATA
mafdiscpred <- predict(mafdisc, data = test)

#BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED.

any help appreciated

thanks

Peter

Peter L. Flom, PhD
Brainscope, Inc.

212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)



	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 25 Jun 2008 - 16:31:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 25 Jun 2008 - 17:31:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive