[R] Beginner help with retrieving frequency and transforming a matrix

From: Sean MacEachern <sean.maceachern_at_ARS.USDA.GOV>
Date: Fri, 28 Mar 2008 10:20:33 -0400


Hi All,

Just hoping some one can give me a hand with a problem...

I have a dataframe (DF) with about 5 million entries that looks something like the following:

>DF

    ID Cl Co Brd Ind A AB AB

1  S-3 IND  A BR_F BR_F01 1  0   0
2  S-3 IND  A BR_F BR_F01 1  0   0
3  S-3 IND  A BR_F BR_F01 1  0   0
4  S-3 IND  A BR_F BR_F01 1  0   0
5  S-3 IND  A BR_F BR_F01 1  0   0
6  S-3 IND  A BR_F BR_F01 0  1   0
7  S-3 IND  A BR_F BR_F02 0  0   1
8  S-3 IND  A BR_F BR_F02 0  1   0
9  S-3 IND  A BR_F BR_F02 1  0   0
10 S-3 IND  A BR_F BR_F02 1  0   0
11 S-3 IND  A BR_F BR_F02 1  0   0
12 S-3 IND  A BR_F BR_F02 1  0   0

I am interested in retrieving the frequency of A for everything with the same Ind code.

I have initially created a column called 'frq' that calculates the individual A frequency

>DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0)

>DF

    ID Cl Co Brd Ind A AB AB frq

1  S-3 IND  A BR_F BR_F01 1  0   0   1
2  S-3 IND  A BR_F BR_F01 1  0   0   1
3  S-3 IND  A BR_F BR_F01 1  0   0   1
4  S-3 IND  A BR_F BR_F01 1  0   0   1
5  S-3 IND  A BR_F BR_F01 1  0   0   1
6  S-3 IND  A BR_F BR_F01 0  1   0  0.5
7  S-3 IND  A BR_F BR_F02 0  0   1   0
8  S-3 IND  A BR_F BR_F02 0  1   0  0.5
9  S-3 IND  A BR_F BR_F02 1  0   0   1
10 S-3 IND  A BR_F BR_F02 1  0   0   1
11 S-3 IND  A BR_F BR_F02 0  1   0  0.5
12 S-3 IND  A BR_F BR_F02 1  0   0   1

I've created a new DF that contains the info I'm interested in:

>DF2 = cbind(DF[1],DF[5],DF[9])

>DF2

    ID Ind frq
1 S-3 BR_F01 1
2 S-3 BR_F01 1

...
...
...

11 S-3 BR_F02 0.5
12 S-3 BR_F02 1

I am wondering is there a method that I can call to calculate the frequency of A or frq for all individuals with the same Ind code so the DF (matrix) looks something like the following? (I've saw something in a tut based on t-tests that I thought would work, but no joy...)

>NewDF

    ID Ind frq
1 S-3 BR_F01 0.9167
2 S-3 BR_F02 0.6667  

Further, is there to then transform the matrix to look something like the following?

>FinalDF

Ind       S-3  S-4  S-5.... S-1000000
BR_F01 0.9167  0.5   1         0.6667
BR_F02 0.6667  0.2   1         0.5
...

...
...
BR_Z98   0.5    1   0.3         1
BR_Z99    1    0.6   1         0.5



Thanks in advance for any help you can offer, and please let me know if there is any further information I can provide.

Sean

> sessionInfo()

R version 2.6.0 (2007-10-03)
i386-apple-darwin8.10.1

locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 28 Mar 2008 - 14:40:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 28 Mar 2008 - 16:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive