[R] how to handle missing values in the data?

From: <uttam.phulwale_at_tcs.com>
Date: Thu 06 Oct 2005 - 14:11:52 EST

Hello Everybody,
I am reffering David Meyer's Benchmarking Support Vector Machines , Report No.78 (Nov.2002), i am newly working with R but i am not sure how it is handling missing values in the benchmark datasets, I would be very thankful to you if you could let me know how to handle those missing numerical & categorical variables in the data (e.g. BreastCancer).

because, i am getting fewer predictions after trained model than the test observations for SVM, so could not calculate confusion matrix. At the same time, function lda(),fda() , rpart() did give the equal predictions. Then i m confused a lot, how these functions handled the missing values, are those missing values are imputed with mean, median or new category??

I have another problem with Generalized Linear Model (glm) function. I might have commited some error, but i am not sure where i did?

The script for glm function i have tried is as:

trdata<-data.frame(train,row.names=NULL) attach(trdata)

glmmod <- glm(Class~., family= binomial(link = "logit"),data=trdata,maxit=50)

tstdata<-data.frame(test,row.names=NULL) attach(tstdata)

xtst <- subset(tstdata, select = -Class) ytst <- Class

pred<-predict(glmmod,xtst)
library(mda)
confusion(pred,ytst)

can you help me to sort out the problems?

Uttam Phulwale
Tata Consultancy Services Limited
Mailto: uttam.phulwale@tcs.com
Website: http://www.tcs.com

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Oct 06 14:26:01 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:23:55 EST