From: Duncan Murdoch <murdoch_at_stats.uwo.ca>

Date: Thu 06 Jul 2006 - 22:46:51 EST

*>
*

*>
*

*>
*

*> 1 Apple S
*

*> 2 Apple A
*

*> 3 Apple O
*

*> 4 Orange A
*

*> 5 Orange O
*

*> 6 Orange S
*

*> 7 Mango M
*

*> 8 Mango A
*

*> 9 Mango S
*

*>
*

*>
*

*> I have to read each line of the 'tb' (tab delim file),
*

*> take the first variable, check if matches any rowname
*

*> of the matrix. Take the second variable of the row in
*

*> and check if it matches any column name. If so, put
*

*> 1 else leave it.
*

*>
*

*>
*

*> The following is a small piece of code that, I felt is
*

*> a solutions. However, since my original matrix and
*

*> tab-delim file is very very huge, I am not sure if it
*

*> is really doing the correct thing. Could any one
*

*> please help me if I am doing this correct.
*

*>
*

*>
*

*>
*

*> + tmat[rownames(tmat)==c,colnames(tmat)==r] <-1
*

*> + }
*

*>
*

*>
*

*>
*

*> Thanks.
*

*>
*

*> ______________________________________________
*

*> R-help@stat.math.ethz.ch mailing list
*

*> https://stat.ethz.ch/mailman/listinfo/r-help
*

*> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 06 22:49:42 2006

Date: Thu 06 Jul 2006 - 22:46:51 EST

On 7/6/2006 8:18 AM, Srinivas Iyyer wrote:

*> hi:
**>
*

> I have matrix with dimensions(200 X 20,000). I have

*> another file, a tab-delim file where first column
**> variables are row names and second column variables
**> are column names.
**>
**>
**> For instance:
**>
*

>> tmat

> Apple Orange Mango Grape Star> A 0 0 0 0 0> O 0 0 0 0 0> M 0 0 0 0 0> G 0 0 0 0 0> S 0 0 0 0 0

>> tb # tab- delim file.

> V1 V2

>> for(i in 1:length(tb[,1])){

> + r = tb[i,1]

> + c = as.character(tb[i,2])

I think that works, but it's not as fast as some other ways of doing the same thing. For example, table(tb) will give you a table of the counts of each pair of entries in tb. pmin(table(tb), 1) will set the maximum count to 1.

An advantage of this approach is that it will show you if there are any entries in tb that aren't in your tmat (typos, etc.). A disadvantage is that if there are any missing categories (e.g. G, Grape, Star in your sample) they won't show up at all, and you may need some manipulations to get things to look exactly the way you asked. For example,

> pmin(table(tb))

V2 V1 A M O S

Apple 1 0 1 1

Mango 1 1 0 1

Orange 1 0 1 1

> pmin(table(tb[,2:1]))

V1

V2 Apple Mango Orange

A 1 1 1 M 0 1 0 O 1 0 1 S 1 1 1

Duncan Murdoch

*>
**>
**>
*

>> tmat

> Apple Orange Mango Grape Star

> A 1 1 1 0 0

> O 1 1 0 0 0> M 0 0 1 0 0> G 0 0 0 0 0> S 1 1 1 0 0

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jul 06 22:49:42 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Fri 07 Jul 2006 - 00:15:15 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*