Re: [R] Create Matrix from Loop of Vectors, Sort It and Pick Top-K

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Thu, 19 Jun 2008 11:46:27 -0500

on 06/19/2008 09:59 AM Gundala Viswanath wrote:
> Hi,
>
> I have the following dataset (simplified for example).
>
> __DATA__
> 300.35 200.25 104.30
> 22.00 31.12 89.99
> 444.50 22.10 43.00
> 22.10 200.55 66.77
>
> Now from that I wish to do the following:
>
> 1. Compute variance of each row
> 2. Pick top-2 row with highest variance
> 3. Store those selected rows for further processing
>
> To achieve this, I tried to: a) read the table and compute
> variance for each row, b) append variance with its original
> row in a vector, c) store a vector into multidimentional array (matrix),
> d) sort that array. But I am stuck at the step (b).
>
> Can anybody suggest what's the best way to achieve
> my aim above?
>
> This is the sample code I have so far (not working).
>
> __BEGIN__
>
> #data <- read.table("testdata.txt")
>
>
> # Is this a right way to initialize?
> all.arr = NULL
>
> for (gi in 1:nofrow) {
> gex <- as.vector(data.matrix(data[gi,],rownames.force=FALSE))
>
> #compute variance
> gexvar <- var(gex)
>
> # join variance with its original vector
> nvec <- c(gexvar,gex)
>
> # I'm stuck here.....This doesn't seem to work
> all.arr <- data.frame(nvec)
> }
>
> print(all.arr)
> __END__
> --

If your data is contained in a data frame 'DF':

 > DF

       V1 V2 V3

1 300.35 200.25 104.30
2  22.00  31.12  89.99
3 444.50  22.10  43.00
4  22.10 200.55  66.77


# Get row-wise variances and cbind() them to DF  > DF.var <- cbind(DF, var = apply(DF, 1, var, na.rm = TRUE))

 > DF.var

       V1 V2 V3 var

1 300.35 200.25 104.30  9610.336
2  22.00  31.12  89.99  1361.915
3 444.50  22.10  43.00 56676.803
4  22.10 200.55  66.77  8622.817


# Sort DF by 'var' using order()
 > DF.var[order(DF.var$var, decreasing = TRUE), ]

       V1 V2 V3 var

3 444.50  22.10  43.00 56676.803
1 300.35 200.25 104.30  9610.336
4  22.10 200.55  66.77  8622.817
2  22.00  31.12  89.99  1361.915


To get the top 2, you can take a couple of approaches:

 > DF.var[order(DF.var$var, decreasing = TRUE)[1:2], ]

       V1 V2 V3 var
3 444.50 22.10 43.0 56676.803
1 300.35 200.25 104.3 9610.336

or

 > head(DF.var[order(DF.var$var, decreasing = TRUE), ], 2)

       V1 V2 V3 var
3 444.50 22.10 43.0 56676.803
1 300.35 200.25 104.3 9610.336

See ?cbind, ?apply, ?order and ?head for more information.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 19 Jun 2008 - 18:18:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 19 Jun 2008 - 20:32:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive