Re: [R] training svm

From: Soumyadeep nandi <soumyadeep_nandi_at_yahoo.com>
Date: Tue, 11 Mar 2008 21:23:31 -0700 (PDT)


Thanks Oldrich and Max,

I have some more queries.
If I need to train svm() with only one instance I get the following error: Error in if (any(co)) { : missing value where TRUE/FALSE needed Will it be wiser if I duplicate the instance with minute changes in the values, or there is some other way to overcome this trouble.

Second is, if I remove the similar columns from the training dataset I would also have to remove the same columns from the test dataset, right?

Regards,
Soumyadeep

Oldrich Kruza <sixtease_at_gmail.com> wrote: Hello Soumyadeep,

Principal Component Analysis tells you which linear combinations of your features are most relevant. So it's not really feature selection. If you want to use PCA, then you have to transform your data so that in each column, there's the linear combination that PCA chose. I think you have to do that yourself and I have no experience with using PCA myself.

I have no experience with or knowledge about Singular Value Decomposition whatsoever, so I'm afraid I can't provide any insight into that.

~ Oldrich

On Fri, Mar 7, 2008 at 9:48 AM, Soumyadeep nandi  wrote:
> Great, I too had the same problem of large size data. But somehow I managed
> to reduced it to some manageable size. I did this before generating the data
> for model building. I still wonder how to reduce matrix size be PCA. Anyway
> if required I would have to do that too. BTW, do you know any tutorial to
> reduce features by PCA or SVD. I find it difficult to work with matrix,
> because after running PCA on the matrix I want to get subset of my data as a
> matrix which I can process further(like making model etc). What I get is
> some principle components. Anyway, lots of thanks for the help you have
> extended.
>
> Best regards,
>
> Soumyadeep Nandi
> Research Scholar
> Center for Computational Biology and Bioinformatics
> School of Information Technology
> Jawaharlal Nehru University
> New Delhi 110067
> India
>
> Oldrich Kruza wrote:
> Hello,
>
> I study computational linguistics in the Charles University in Prague,
> Czech Republic. Now I'm working on my master thesis during my
> 1-semester stay in the Saarland University, Germany.
>
> It's funny - I'm struggling with SVM's right now myself. My data set
> has over 2 GB, I managed to reduce it to about 270 MB by feature
> selection and getting rid of labels and the like. Still, training the
> SVM crashes because of memory exhaustion even on a machine with 16GB
> of RAM. So that's why I had the memory in my head when replying to
> your question. :-)
>
> ~ Oldrich
>
> On Fri, Mar 7, 2008 at 8:41 AM, Soumyadeep nandi
> wrote:
> > Thanks a lot Oldrich,
> > Yes, its a good idea to remove the columns before taking the data into R
> and
> > you are right this would reduce the memory load.
> >
> > Thanks a lot, your help is really appreciable. :-)
> >
> > BTW, if you dont mind some personal queries, what do you do?
> >
> > With best of my regards,
> > Mr Soumyadeep Nandi
> > Research Scholar
> > Center for Computational Biology and Bioinformatics
> > School of Information Technology
> > Jawaharlal Nehru University
> > New Delhi 110067
> > India
> >
> >
>
> > Oldrich Kruza wrote:
> > Hello Soumyadeep,
> >
> > if you store the data in a tabular file, then I suggest using standard
> > text-editing tools like cut (say your file is called data.csv, fields
> > are separated with commas and you want to get rid of the third and
> > sixth column):
> >
> > $ cut --complement --delimiter="," --fields=3,6 < data.csv > data_cut.csv
> >
> > If you're not in an Unix environment but have perl, then you may use a
> > script like:
> >
> > open SRC, "data.csv" or die("couldn't open source");
> > open DST, ">data_cut.csv" or die("couldn't open destination");
> > while () {
> >
> > chomp;
> > @fields = split /,/; #substitute the comma for the delimiter you use
> > splice @fields, 2, 1; #get rid of third column (they're
> > zero-based, thus 2 instead of 3)
> > splice @fields, 5, 1; #get rid of sixth column
> > print DST join(",", @fields), "\n";
> > }
> >
> > If you need to do the selection within R, then you can do it by
> > indexing the data structure. Suppose you have the data in a data.frame
> > called data. Then:
> >
> > > data <- data[,-6]
> > > data <- data[,-3]
> >
> > might do the trick (but since I'm not much of an R hacker, this is
> > without guarantee). I think it might be better however to do the
> > preprocessing before the data get into R because then you avoid
> > loading the columns to discard into memory.
> >
> > Hope this helps
> > ~ Oldrich
> >
> > On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi
> > wrote:
> > > Thanks Oldrich,
> > > Actually I was not sure if I can remove these columns and build model.
> > > Thanks a lot for your kind suggestion. Could you tell me if there any
> > > function to remove these columns from the data matrix.
> > >
> > > With best regards,
> > > Soumyadeep
> > >
> > >
> >
> > > Oldrich Kruza wrote:
> > > A rather technical workaround I see could be adding a row with a
> > > different value. But if a column only ever has one value, then it
> > > contributes nothing to the model and I see no reason why it would have
> > > to be kept.
> > > ~ Oldrich Kruza
> > >
> > > On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
> > > wrote:
> > > > What should I do if I need to train svm() with data having same value
> > > across
> > > > all rows in some columns. These must be the important features of the
> > > class
> > > > and we cant exclude these columns to build up models.
> > > >
> > > > The error I am getting is:
> > > > Error in predict.svm(ret, xhold) : Model is empty!
> > > > In addition: Warning message:
> > > > In svm.default(datatrain, classtrain) :
> > > > Variable(s) 'F112' and 'F113'.... [... truncated]
> > > >
> > > > Is there any way to overcome this problem? Any suggestions would be
> > highly
> > > > helpful.
> > > >
> > > > Regards
> > > > Soumyadeep
> > > >
> > > >
> > > > ________________________________
> > > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try
> > it
> > > > now.
> > >
> > >
> > >
> > > ________________________________
> > > Looking for last minute shopping deals? Find them fast with Yahoo!
> Search.
> >
> >
> >
> >
> > ________________________________
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> > now.
>
>
>
>
> ________________________________
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> now.
       


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 04:26:57 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Mar 2008 - 04:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive