[R] Odp: In need of help with correlations

From: Petr PIKAL <petr.pikal_at_precheza.cz>
Date: Mon, 11 Apr 2011 09:04:40 +0200

Hi

r-help-bounces_at_r-project.org napsal dne 09.04.2011 19:24:38:

> I am in need of someone's help in correlating gene expression. I'm
somewhat
> new to R, and can't seem to find anyone local to help me with what I
think
> is a simple problem.
>
> I need to obtain pearson and spearman correlation coefficients, and
> corresponding p-values for all of the genes in my dataset that correlate
to
> one specific gene of interest. I'm working with mouse Affymetrix Mouse
430
> 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column
> containing identifiers) and 30 biological replicates (columns; with the
top
> row containing the header information).
>
> I've looked through several Intro manuals and the R help files.
>
> I know that "cor(x,y, use ="everything", method = c("pearson")) " can
help
> obtain the coefficients.
>
> I also know that "cor.test()" is supposed to test the significance of a
> single correlation coefficients.
>
> I've also found the bioconductor package "genefilter" / "genefinder"
that
> looks for correlations to a given gene (although I can't get it to
work).
>
> So far I've been able to:
>
> #Read in the csv file
> data<-read.csv("my data.csv")
>
> #Check the dimensions, names, class, fix(data) to ensure the file was
> loaded properly
> dim(data)
> names(data)
> class(data)
> fix(data)
>
> #So far I've been able to successfully correlate the entire 'column'
matrix
> through:
> x <- data[,2:30]
> y <- data[,2:30]
>
> corr.data<-cor(x,y, use = "everything", method = c("pearson"))
>
> write.csv(corr.data, file = "correlation of my data by columns.csv")
>
> -----------------------------------
>
> Now if I try and run the 'cor.test()' function on the same matrix, I get
and
> error message with 'x' must be a numeric vector. This I don't
understand.

In cor.test help page it is said

x, y: numeric vectors of data values. ‘x’ and ‘y’ must have the

          same length.

however your data[,2:30] is most probably data frame, see

str(data[,2:20])

To be able to do cor.test you need to do cor.test like

cor.test(data[,2], data[,3])

or to do it in some cycle (untested)
result <- matrix(NA, 20,20)

for( i in 2:20) {
for(j in i+1:20) {

result[i,j] <- cor.test(data[,i], data[,j]) }}

But most probably there are other ways.

Regards
Petr

> And this is not my goal, but rather me trying to learn how to go about
doing
> correlation analysis in R.
>
> I've also tried transposing the data.frame using
"as.data.frame(t(data))"
> and doing so gives the same error message as above.
>
> Can anyone help me with figuring out how to conduct a correlation
analysis
> for specific gene/probeset, and help me understand why I get the above
error
> message? I know it probably is a simple analysis, that is probably just
over
> my head right now since I'm still new to R. But I can't figure it out
and
> have been trying with a bunch of different variations for the past week.
>
> Thank you in advance for your help.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 11 Apr 2011 - 07:31:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Apr 2011 - 07:40:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive