[R] Errors melt()ing data...

From: Neil Shephard <nshephard_at_gmail.com>
Date: Thu, 28 Feb 2008 11:42:14 +0000


Hi,

I'm trying to melt() some data for subsequent cast()ing and am encoutering errors.

The overall process requires a couple of casts()s and melt()s.

########Start Session 1##########
## I have the data in a (fully) melted format and can cast it fine...
> norm1[1:10,]

   Pool       SNP Sample.Name variable       value
1     1 rs1045485      CA0092 Height.1 0.003488853
2     1 rs1045485      CA0142 Height.2 0.333274200
3     1 rs1045485      CO0007 Height.2 0.396250961
4     1 rs1045485      CA0047 Height.2 0.535686831
5     1 rs1045485      CO0149 Height.2 0.296611673
6     1 rs1045485      CA0106 Height.2 0.786115546
7     1 rs1045485      CO0191 Height.1 0.669268523
8     1 rs1045485      CA0097 Height.2 0.609603217
9     1 rs1045485      CA0076 Height.1 0.004257584
10    1 rs1045485      CO0017 Height.2 0.589261427
## This gets the data
> t.norm1 <- cast(norm1, Sample.Name + SNP + Pool ~ variable, sum)
> t.norm1[1:10,]
   Sample.Name        SNP Pool    Height.1  Height.2
1       CA0001  rs1045485    1 0.003311454 0.4789782
2       CA0001  rs1045487    1 0.001818583 0.5089827
3       CA0001 rs11212570    1 0.006078444 0.4496129
4       CA0001 rs13010627    1 0.008753049 0.5424499
5       CA0001    rs13113    1 0.186821486 0.2294912
6       CA0001 rs13402616    1 0.012030235 0.4161610
7       CA0001   rs170548    1 0.002425579 0.3111907
8       CA0001 rs17503908    1 0.002179705 0.3063292
9       CA0001  rs1799794    1 0.003632984 0.5049848
10      CA0001  rs1799796    1 0.389774160 0.0000000
## I now melt it and cast again to the desired format

> t <- melt(t.norm1, id = c("Sample.Name", "SNP"))
> cast.height.norm1 <- cast(t, SNP ~ Sample.Name + variable, sum)
> cast.height.norm1[1:10,1:5]
          SNP CA0001_Height.1 CA0001_Height.2 CA0002_Height.1 CA0002_Height.2
1   rs1045485     0.003311454       0.4789782     0.401218142     0.343031163
2   rs1045487     0.001818583       0.5089827     0.007329439     0.453102612
3  rs11212570     0.006078444       0.4496129     0.015164118     0.434320814
4  rs13010627     0.008753049       0.5424499     0.013440474     0.463863778
5     rs13113     0.186821486       0.2294912     0.224865477     0.272916077
6  rs13402616     0.012030235       0.4161610     0.191099755     0.285744704
7    rs170548     0.002425579       0.3111907     0.365986770     0.240187431
8  rs17503908     0.002179705       0.3063292     0.011100347     0.232259627
9   rs1799794     0.003632984       0.5049848     0.430635350     0.008364312
10  rs1799796     0.389774160       0.0000000     0.173564141     0.235928006
########Finish Session 1##########

This is the format that I'm aiming for and everythings worked fine. However, I wish to derive two transformed variables (polar.1 and polar.2) based on each row of t.norm1 and then melt() and cast() the data into the same desired format.

########Start Session 2##########
## Now generate polar co-ordinates
t.norm1$polar.1 <- log10(sqrt(t.norm1$Height.1^2 + t.norm1$Height.2^2)) t.norm1$polar.2 <- atan((t.norm1$Height.2 / t.norm1$Height.1)) ## And cast the polar data
> t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2")), id=c("Sample.Name", "SNP"))
Error in if (!missing(id.var) && !(id.var %in% varnames)) { :   missing value where TRUE/FALSE needed
> traceback()

4: melt_check(data, id.var, measure.var)
3: melt.data.frame(as.data.frame(data), id = attr(data, "idvars"))
2: melt.cast_df(subset(t.norm1, select = c("Sample.Name", "SNP",
       "Pool", "polar.1", "polar.2")), id = c("Sample.Name", "SNP"),
       measure = c("polar.1", "polar.2"))
1: melt(subset(t.norm1, select = c("Sample.Name", "SNP", "Pool",
       "polar.1", "polar.2")), id = c("Sample.Name", "SNP"), measure =
c("polar.1",
       "polar.2"))

########Finish Session 2##########

As far as I can tell the error is occurring within melt_check() where there is a check to see if the id.var is missing and whether the id.var exists within the data frames names, both of which are true since the subset() call works fine on its own...

########Start Session 3##########
> test <- subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2"))
> names(test)

[1] "Sample.Name" "SNP" "Pool" "polar.1" "polar.2" ########Start Session 3##########

What I find particularly strange is that there isn't really any difference between
########Session 1
> t <- melt(t.norm1, id = c("Sample.Name", "SNP"))

....and
########Session 2
t <- melt(subset(t.norm1, select= c("Sample.Name", "SNP", "Pool", "polar.1", "polar.2")), id=c("Sample.Name", "SNP"))

..since I've done nothing to alter the "Sample.Name" and "SNP" columns, all thats changing is the names of the two columns that are the measure.var which in this instance is everything thats not defined as being and id.var in the call to melt().

If anyone can provide any insight to what I'm doing wrong I'd be very grateful.

Thanks,

Neil

--
Email - nshephard_at_gmail.com / n.shephard_at_sheffield.ac.uk

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 28 Feb 2008 - 11:45:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Feb 2008 - 13:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive