Re: [R] training svm

From: Charilaos Skiadas <cskiadas_at_gmail.com>
Date: Fri, 07 Mar 2008 07:41:26 -0500

On Mar 7, 2008, at 2:17 AM, Oldrich Kruza wrote:

> Hello Soumyadeep,
>
> if you store the data in a tabular file, then I suggest using standard
> text-editing tools like cut (say your file is called data.csv, fields
> are separated with commas and you want to get rid of the third and
> sixth column):
>
> $ cut --complement --delimiter="," --fields=3,6 < data.csv >
> data_cut.csv
>
> If you're not in an Unix environment but have perl, then you may use a
> script like:
>
> open SRC, "data.csv" or die("couldn't open source");
> open DST, ">data_cut.csv" or die("couldn't open destination");
> while (<SRC>) {
> chomp;
> @fields = split /,/; #substitute the comma for the
> delimiter you use
> splice @fields, 2, 1; #get rid of third column (they're
> zero-based, thus 2 instead of 3)
> splice @fields, 5, 1; #get rid of sixth column
> print DST join(",", @fields), "\n";
> }
>
> If you need to do the selection within R, then you can do it by
> indexing the data structure. Suppose you have the data in a data.frame
> called data. Then:
>
>> data <- data[,-6]
>> data <- data[,-3]
>
> might do the trick (but since I'm not much of an R hacker, this is
> without guarantee). I think it might be better however to do the
> preprocessing before the data get into R because then you avoid
> loading the columns to discard into memory.

I am guessing that the data is already in R, so it should be easier to do it in R, especially if he doesn't know which columns are the ones with all identical values. For instance, suppose the data set is called x. Then the following would return TRUE for the columns that have all values the same:

allsame <- sapply(x,function(y) length(table(y))==1)

and then the following will take them out

newdata <- x[,!allsame]

> Hope this helps
> ~ Oldrich

Haris Skiadas
Department of Mathematics and Computer Science Hanover College



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Mar 2008 - 13:06:27 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 21:30:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive