Re: [R] training svm

From: Max Kuhn <mxkuhn_at_gmail.com>
Date: Fri, 07 Mar 2008 15:26:29 -0500

Also, see the nearZeroVar function in the caret package.

MAx

On Fri, Mar 7, 2008 at 7:41 AM, Charilaos Skiadas <cskiadas_at_gmail.com> wrote:
>
> On Mar 7, 2008, at 2:17 AM, Oldrich Kruza wrote:
>
> > Hello Soumyadeep,
> >
> > if you store the data in a tabular file, then I suggest using standard
> > text-editing tools like cut (say your file is called data.csv, fields
> > are separated with commas and you want to get rid of the third and
> > sixth column):
> >
> > $ cut --complement --delimiter="," --fields=3,6 < data.csv >
> > data_cut.csv
> >
> > If you're not in an Unix environment but have perl, then you may use a
> > script like:
> >
> > open SRC, "data.csv" or die("couldn't open source");
> > open DST, ">data_cut.csv" or die("couldn't open destination");
> > while (<SRC>) {
> > chomp;
> > @fields = split /,/; #substitute the comma for the
> > delimiter you use
> > splice @fields, 2, 1; #get rid of third column (they're
> > zero-based, thus 2 instead of 3)
> > splice @fields, 5, 1; #get rid of sixth column
> > print DST join(",", @fields), "\n";
> > }
> >
> > If you need to do the selection within R, then you can do it by
> > indexing the data structure. Suppose you have the data in a data.frame
> > called data. Then:
> >
> >> data <- data[,-6]
> >> data <- data[,-3]
> >
> > might do the trick (but since I'm not much of an R hacker, this is
> > without guarantee). I think it might be better however to do the
> > preprocessing before the data get into R because then you avoid
> > loading the columns to discard into memory.
>
> I am guessing that the data is already in R, so it should be easier
> to do it in R, especially if he doesn't know which columns are the
> ones with all identical values. For instance, suppose the data set is
> called x. Then the following would return TRUE for the columns that
> have all values the same:
>
> allsame <- sapply(x,function(y) length(table(y))==1)
>
> and then the following will take them out
>
> newdata <- x[,!allsame]
>
> > Hope this helps
> > ~ Oldrich
>
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College
>
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 

Max

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 07 Mar 2008 - 20:29:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 20:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive