Re: [R] pvclust missing values problem

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Mon 10 Jul 2006 - 22:03:00 EST

On Mon, 2006-07-10 at 11:56 +0100, Richard Birnie wrote:
> Hello all,

Hi Richard,

Sorry, I know nothing about pvclust and have never used it, but here are a couple of general suggestions/observations.

You are asked to contact the package maintainer *not* R-Help for questions relating to contributed packages. I dare say only the maintainer (or some kind soul with too much time on their hands to debug the package) can solve your problem, if you can't work out how to do it with R's debug tools. I have cc'd the maintainer on this reply.

When you get an error, it is useful to supply the output from traceback() to see where the error actually occurred.

By the way, the error NA/NaN/Inf in foreign function call (arg 11) doesn't necessarily mean you have missing values in your data set. Note the NaN and Inf in that error message. It could just be that one of the calculations resulted in NaN's or Inf's which hclust detected or caught and issued the error. Without traceback(), this is pure speculation.

R version info, platform (OS) and version of pvclust are all useful extra bits of info that you could have provided us or the maintainer as well.

Hopefully you can sort your problem out with the maintainer.

Best,

G

>
> I posted a question to this list last week and received no response. I
> am unsure if this means no-one knows the answer or if I posed the
> question badly. I'm going to assume I posed the question badly and try
> again. I am new to R so it is quite likely it's a very naive question,
> however if there is something blindingly obvious that I am missing or
> if there is another resource I should consult that I haven't seen
> would someone be kind enough to point it out because it isn't obvious
> to me. Although my data is from biological experiments I think my
> problem is with R rather than the nature of the data, but I may be
> wrong.
>
> I am attempting to use the pvclust package to do some hierarchical
> clustering on some CGH data I have downloaded from the Progenetix
> database
> (http://www.progenetix.de/~pgscripts/progenetix/Aboutprogenetix.html).
> The data is in tab delimited format, each column is a single sample
> each row is a chromosome band some example dummy data is shown below.
>
> band sample1 sample2 sample3 sample4
> 1p36_33 1 0 0 1
> 1p36_32 -1 0 -1 0
> 1p36_31 0 1 1 1
> 1p36_22 0 -1 -1 -1
> etc.... where 0 = no change, 1 = gain, -1 = loss
>
> I have read this file into R using:
> > ProgenetixCRC.all.noXY <-

> read.table("/home/marraydb/Progenetix/Data/CRCall_noXY.txt",
> header=TRUE, sep="\t", row.names="band")
>
> based on the pvclust documentation I came up with this:
> >ProgenetixCRC.all.pvclust <- pvclust(ProgenetixCRC.all,
> method.dist="cor",
> method.hclust="average",use.cor="pairwise.complete.obs",nboot=1000)
>
> this results in an error
> Error in hclust(distance, method = method.hclust) :
> NA/NaN/Inf in foreign function call (arg 11)
> Digging through the mailing list archives I've discovered this means
> that my dataset has missing values. This is very confusing because I
> have checked and there are no missing values. Running is.na() over the
> data matrix results in all false values which I take to mean none of
> the values are NA. I tried various options for the use.cor argument
> all with the same result.
>
> Since I originally posted this question I tried changing method.dist
> to euclidean, in this form the function executes without any errors.
> This is not to say the results actually mean anything of course. I am
> at a loss as to how to proceed any input from someone more experienced
> would be gratefully appreciated. If there is some reason why I should
> not be doing this analysis this way in the first place then I'd
> appreciate having that pointed out also. I've tried not to put excess
> information in here but if more is needed then let me know what and
> I'll post it.
>
> I suspect the problem is me, however if it really is the case that
> no-one knows how to answer this then could anyone suggest another
> mailing list where I might get a better response. Would bioconductor
> be a better option for example?
>
> Apologies for any offence caused by posting the same question but it's
> difficult for me to proceed until I get some kind of response, even if
> it is that this list is not the right place for this question.
>
> Thanks for your patience,
> Richard

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/cv/
 London, UK. WC1E 6BT.         [w] http://www.ucl.ac.uk/~ucfagls/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon Jul 10 22:07:49 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 11 Jul 2006 - 00:17:17 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.