Re: [R] how to "singlify" entries

From: Charles Plessy <charles-r-nospam_at_plessy.org>
Date: Tue 31 May 2005 - 00:54:59 EST

On Mon, May 30, 2005 at 09:09:27AM -0400, Gabor Grothendieck wrote :

> Try using reshape, e.g. if dd is your data frame:
>
> reshape(dd, dir = "wide", idvar = "F1", timevar = "F2",
> varying = list(c("VX","VY")))

Thank you very much, and to Petr Pikal too. Reshape is exactly what I had forgotten.

Now the bad news is that I have simplified my example ; I am in a slightly more complex situation :

I have three factors, and one value

> count_per_tc[1:10,]

   rna lib tc x
1 CAB 114BA T01F00380F47 1
2 CAE 114BB T01F00381273 1
3 CAJ 114BA T01F0048F6D1 1
4 CAB 114BC T01F0048F6D1 1
5 CAB 114BA T01F00498689 2
6 CAC 114BA T01F00498689 1
7 CAE 114BA T01F00498689 2
8 CAG 114BA T01F00498689 2
9 CAH 114BA T01F00498689 1
10 CAI 114BA T01F00498689 2 I would like a data frame where I have the value of x for each combination of "rna" and "lib", for each "tc"

> reshape(count_per_tc[1:10,], direction="wide", timevar="tc", idvar=c("rna","lib"))

   rna lib x.T01F00380F47 x.T01F00381273 x.T01F0048F6D1 x.T01F00498689

1  CAB 114BA              1             NA             NA              2
2  CAE 114BB             NA              1             NA             NA
3  CAJ 114BA             NA             NA              1             NA
4  CAB 114BC             NA             NA              1             NA
6  CAC 114BA             NA             NA             NA              1
7  CAE 114BA             NA             NA             NA              2
8  CAG 114BA             NA             NA             NA              2
9  CAH 114BA             NA             NA             NA              1
10 CAI 114BA             NA             NA             NA              2

oops, the other way round :

> t(reshape(count_per_tc[1:10,], direction="wide", timevar="tc", idvar=c("rna","lib")))

               1       2       3       4       6       7       8       9       10     
rna            "CAB"   "CAE"   "CAJ"   "CAB"   "CAC"   "CAE"   "CAG"   "CAH"   "CAI"  
lib            "114BA" "114BB" "114BA" "114BC" "114BA" "114BA" "114BA" "114BA" "114BA"
x.T01F00380F47 " 1"    NA      NA      NA      NA      NA      NA      NA      NA     
x.T01F00381273 NA      " 1"    NA      NA      NA      NA      NA      NA      NA     
x.T01F0048F6D1 NA      NA      " 1"    " 1"    NA      NA      NA      NA      NA     
x.T01F00498689 " 2"    NA      NA      NA      " 1"    " 2"    " 2"    " 1"    " 2"   

The ultimate goal is (after proper renaming of the columns) to do things like

plot(CAA-114BA[CAA-114BA >0 & CAA-114BB > 0], CAA-114BB[CAA-114BA >0 & CAA-114BB > 0])

(this combination will appear if I reshape the whole data frame, which has 200,000 rows.)

and then proper statistical tests (which I still have to learn / remember from 12 years ago).

once again, thank you, and please warn me if I am doing something stupid with this transposition of the reshaped table.

Best regards,

-- 
Charles

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue May 31 00:59:56 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:16 EST