[R] help with colsplit (reshape)

From: Ista Zahn <istazahn_at_gmail.com>
Date: Fri, 13 Jun 2008 11:46:06 -0400


Dear list,

I'm trying to figure out how to use the reshape package to reshape data from a "wide" format to a "long" format. I have data like this

pid <- c(1:10)

predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2)
predB.1 <- c(0,0,0,1,1,0,0,0,1,1)
predB.2 <- c(2,2,3,3,3,2,2,3,3,3)

predC.1 <- c(10,10,10,10,10,11,11,11,11,11) predC.2 <- c(12,12,13,13,13,12,12,13,13,13) out.1 <- c(100:109)
out.2 <- c(200:209)
Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 1, out.2)

and I want to make it look like this:

head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8), idvar="pid", v.names=c("PredA", "PredB", "Out"), timevar="measure.num", times=c(1,2), direction="long"))

     pid predA measure.num PredA PredB Out
1.1   1    -1           1     0    10 100
2.1   2    -2           1     0    10 101
3.1   3    -1           1     0    10 102
4.1   4    -2           1     1    10 103
5.1   5    -1           1     1    10 104
6.1   6    -2           1     0    11 105

Using Hadley's JSS article "Reshaping Data with the reshape Package" as a guide, I tried the following:

M.Data <- melt(Data, id="pid")
M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names = c("treatment", "time")))

but this gave a warning and resulted in

head(M.Data2)

   pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4

1   1    predA    -1        NA   NA  NA    NA    NA    NA    NA
2   2    predA    -2        NA   NA  NA    NA    NA    NA    NA
3   3    predA    -1        NA   NA  NA    NA    NA    NA    NA
4   4    predA    -2        NA   NA  NA    NA    NA    NA    NA
5   5    predA    -1        NA   NA  NA    NA    NA    NA    NA
6   6    predA    -2        NA   NA  NA    NA    NA    NA    NA

I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html   which led me to try

M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.", names = c("treatment", "time")))

which gave:

head(M.Data2)

   pid variable value treatment time

1   1    predA    -1     predA predA
2   2    predA    -2     predA predA
3   3    predA    -1     predA predA
4   4    predA    -2     predA predA
5   5    predA    -1     predA predA
6   6    predA    -2     predA predA

Closer but no cigar.

I would be grateful if someone will tell me (a) how to reshape the data as described above using the reshape package, (b) what difference between split = "." and split = "\\." is, and (c) if more information about the colsplit command is available anywhere.

Thank you very much in advance,
Ista



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 Jun 2008 - 19:22:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 20:31:55 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive