Re: [R] training svm

From: Oldrich Kruza <sixtease_at_gmail.com>
Date: Fri, 07 Mar 2008 08:17:31 +0100

Hello Soumyadeep,

if you store the data in a tabular file, then I suggest using standard text-editing tools like cut (say your file is called data.csv, fields are separated with commas and you want to get rid of the third and sixth column):

$ cut --complement --delimiter="," --fields=3,6 < data.csv > data_cut.csv

If you're not in an Unix environment but have perl, then you may use a script like:

 open SRC, "data.csv" or die("couldn't open source");  open DST, ">data_cut.csv" or die("couldn't open destination");  while (<SRC>) {

     chomp;
     @fields = split /,/;    #substitute the comma for the delimiter you use
     splice @fields, 2, 1;    #get rid of third column (they're
zero-based, thus 2 instead of 3)
     splice @fields, 5, 1;    #get rid of sixth column
     print DST join(",", @fields), "\n";
 }

If you need to do the selection within R, then you can do it by indexing the data structure. Suppose you have the data in a data.frame called data. Then:

> data <- data[,-6]
> data <- data[,-3]

might do the trick (but since I'm not much of an R hacker, this is without guarantee). I think it might be better however to do the preprocessing before the data get into R because then you avoid loading the columns to discard into memory.

Hope this helps
~ Oldrich

On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi <soumyadeep_nandi_at_yahoo.com> wrote:
> Thanks Oldrich,
> Actually I was not sure if I can remove these columns and build model.
> Thanks a lot for your kind suggestion. Could you tell me if there any
> function to remove these columns from the data matrix.
>
> With best regards,
> Soumyadeep
>
>
> Oldrich Kruza <sixtease_at_gmail.com> wrote:
> A rather technical workaround I see could be adding a row with a
> different value. But if a column only ever has one value, then it
> contributes nothing to the model and I see no reason why it would have
> to be kept.
> ~ Oldrich Kruza
>
> On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
> wrote:
> > What should I do if I need to train svm() with data having same value
> across
> > all rows in some columns. These must be the important features of the
> class
> > and we cant exclude these columns to build up models.
> >
> > The error I am getting is:
> > Error in predict.svm(ret, xhold) : Model is empty!
> > In addition: Warning message:
> > In svm.default(datatrain, classtrain) :
> > Variable(s) 'F112' and 'F113'.... [... truncated]
> >
> > Is there any way to overcome this problem? Any suggestions would be highly
> > helpful.
> >
> > Regards
> > Soumyadeep
> >
> >
> > ________________________________
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> > now.
>
>
>
> ________________________________
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Mar 2008 - 07:23:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 13:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive