Re: [R] Problem with rowMeans()

From: Erik Iverson <iverson_at_biostat.wisc.edu>
Date: Thu, 12 Jun 2008 19:16:21 -0500

ss wrote:
> Thanks, Erik. I will try your code soon.
>
> I did this first:
>
> > data <-
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> row.names = NULL ,header=TRUE, fill=TRUE)
> > class(data[[3]])
> [1] "factor"
> > is.numeric(data[[3]])
> [1] FALSE
> >
>
> So it is not numeric but 'factor' instead.
> Can I convert this column to numeric?

That depends. My first question if I were you would be 'Why does read.table assign the class factor to this column.'

Then read ?factor, paying particular attention to,

   In particular,

      'as.numeric' applied to a factor is meaningless, and may happen by
      implicit coercion.  To transform a factor 'f' to its original
      numeric values, 'as.numeric(levels(f))[f]' is recommended and
      slightly more efficient than 'as.numeric(as.character(f))'.

You might also try levels(data[[3]]), but the list will be long. The goal is to find the value(s) that are causing read.table to assign the class 'factor' to this column. You have lots of values though, so I might try something like the following:

setdiff(levels(data[[3]]),
as.character(as.numeric(levels(data[[3]])[data[[3]]])))

and look at what that returns (you'll get a warning). Hopefully that tells you what is missing.

I see your new email, so that's that!

Good luck,
Erik

>
> Allen
>
> On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>> wrote:
>
>
>
> ss wrote:
>
> It is:
>
> > data <-
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> row.names = NULL ,header=TRUE, fill=TRUE)
> > class(data[3])
> [1] "data.frame"
> >
>
>
> Oops, should have said class(data[[3]]) and
> is.numeric(data[[3]])
>
> See ?Extract
>
>
>
> And if I try to use as.matrix(read.table()), I got:
>
> >data
> <-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> + row.names = NULL ,header=TRUE, fill=TRUE))
> > data[1:4,1:4]
> Probe_ID Gene_Symbol M16012391010920 M16012391010525
> [1,] "A_23_P105862" "13CDNA73" "-1.6" " 0.16"
> [2,] "A_23_P76435" "15E1.2" "0.18" " 0.59"
> [3,] "A_24_P402115" "15E1.2" "1.63" "-0.62"
> [4,] "A_32_P227764" "15E1.2" "-0.76" "-0.42"
> You see they are surrounded by "".
>
> I don't see such if I just use >read.table
>

>
> That is because matrices (objects of class 'matrix') are of
> homogeneous type. It changes everything to a character (including
> the numbers), which you certainly do NOT want.
>
> You want a data.frame, I will provide an example of what I think you
> are after.
>
> Try the following commands and see how they compare to your
> situation: these work for me.
>
> test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y =
> rnorm(26), z = rnorm(26))
>
> test
>
> class(test)
>
> is.numeric(test[[2]])
>
> is.numeric(test[[3]])
>
> rowMeans(test)
>
> rowMeans(test[2:3])
>
> > data <-
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> row.names = NULL ,header=TRUE, fill=TRUE)
> > data[1:4,1:4]
> Probe_ID Gene_Symbol M16012391010920 M16012391010525
> 1 A_23_P105862 13CDNA73 -1.6 0.16
> 2 A_23_P76435 15E1.2 0.18 0.59
> 3 A_24_P402115 15E1.2 1.63 -0.62
> 4 A_32_P227764 15E1.2 -0.76 -0.42
>
>
> Thanks,
> Allen
>
>
>
> On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson
> <iverson_at_biostat.wisc.edu <mailto:iverson_at_biostat.wisc.edu>
> <mailto:iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>>> wrote:
>
>
>
> ss wrote:
>
> Hi Wacek,
>
> Yes, data is data frame not a matrix.
>
> is.numeric(data[3])
>
> [1] FALSE
>
>
> what is class(data[3])
>
>
> But I looked at the column 3 and it looks okay though.
> There are
> few NAs and
> I did find
> anything strange.
>
> Any suggestions?
>
> Thanks,
> Allen
>
>
>
> On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
> Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no
> <mailto:Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no>
> <mailto:Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no
> <mailto:Waclaw.Marcin.Kusnierczyk_at_idi.ntnu.no>>> wrote:
>
> ss wrote:
>
> Thank you very much, Wacek! It works very well.
> But there is a minor problem. I did the following:
>
> data <-
>
>
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> +row.names = NULL ,header=TRUE, fill=TRUE)
>
> looks like you have a data frame, not a matrix
>
>
> dim(data)
>
> [1] 23963 85
>
> data[1:4,1:4]
>
> Probe_ID Gene_Symbol M16012391010920
> M16012391010525
> 1 A_23_P105862 13CDNA73 -1.6
> 0.16
> 2 A_23_P76435 15E1.2 0.18
> 0.59
> 3 A_24_P402115 15E1.2 1.63
> -0.62
> 4 A_32_P227764 15E1.2 -0.76
> -0.42
>
> data1<-data[sapply(data, is.numeric)]
> dim(data1)
>
> [1] 23963 82
>
> data1[1:4,1:4]
>
> M16012391010525 M16012391010843 M16012391010531
> M16012391010921
> 1 0.16 -0.23 -1.40
> 0.90
> 2 0.59 0.28 -0.30
> 0.08

> 3 -0.62 -0.62 -0.22
> -0.18
> 4 -0.42 0.01 0.28
> -0.79
>
> You will notice that, after using 'data[sapply(data,
> is.numeric)]' and
> getting
> data1, the first sample in data, called
> 'M16012391010920', was missed
> in data1.
>
> Any further suggestions?
>
> surely there must be an entry in column 3 that makes it
> non-numeric.
> what does is.numeric(data[3]) say? (NAs should not
> make a
> column
> non-numeric, unless there are only NAs there, which
> is not
> the case
> here.) check your data for non-numeric entries in
> column 3,
> there can
> be a typo.
>
> vQ
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org <mailto:R-help_at_r-project.org>
> <mailto:R-help_at_r-project.org <mailto:R-help_at_r-project.org>>
> mailing list
>
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 Jun 2008 - 00:19:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 Jun 2008 - 00:31:03 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive