Re: [R] Thinking about using two y-scales on your plot?

From: Richard Cotton <Richard.Cotton_at_hsl.gov.uk>
Date: Mon, 07 Apr 2008 03:24:23 -0700 (PDT)

thegeologician wrote:
>
> A plot of the actual temperature during a year (or thousands of years,
> as people in palaeoclimate-studies are rather used to) is just so much
> more intuitive, than some correlation-coefficients or such. I know I'm
> largely speaking to statisticians in this forum, but in Earth Sciences,
> most people aren't... I see the use of correlation coefficients and
> -plots in proofing that an apparent correlation is "real", but the first
> question upon presenting any statistic analysis is always "What does the
> DATA look like?".
>

Agreed - the data itself is much easier to get to grips with than correlation coefficients.

thegeologician wrote:
>
> Of course, these plots could be plotted separately with a common x-axis,
> it's just a matter of saving space and of being used to that kind of
> graph. I can't imagine anyone being falsely lead to a thought like "oh
> gosh, the temperature is much higher/bigger/more than the
> precipitation!" - that makes no sense. I do see the point in graphs
> where values are plotted together, whose possible interaction with each
> other might lead to wrong conclusions. Then, it might not be obvious
> that one is drawing a senseless conclusion.
>

I think in the temperature/ precipitation case, whether to draw multiple y-axes or not is a fairly minor decision. The reader would have to be pretty dumb to assume that temperatures and precipitations can be compared. The point is that it can appear that way - so the reader has to engage their brain to tell themselves "ignore the obvious comparisons between the lines that I perceive". This is clearly not a desirable trait in a graph.

I've concocted an example to show that it's possible to mislead unwary readers by changing the y-axes scale.

This uses the nottem temperature dataset built into R, and some made-up precipitation data.

#Generate some precipitation data
precipitation =
30+runif(240,5,10)*sin(seq(pi/6,40*pi,pi/6)+pi/4)+rnorm(240,0,3) pts <- ts(precipitation, start=1920, frequency=12)

#First plot, correlation is apparent
plot(nottem)
par(new=TRUE)
plot(pts, axes=FALSE, col="blue", ylab="") Axis(side=4)

#Second plot, scale changing makes it appear that precipitation does not vary with temperature.
plot(nottem)
par(new=TRUE)
plot(pts, axes=FALSE, col="blue", ylab="", ylim=c(0,10000)) Axis(side=4)

I'm willing to concede that the attempt at misleading the audience is pretty artificial, and not very subtle. A more dangerous case would be the opposite situation - making a correlation become visible on a plot where none really exists, by fiddling with axes tranformations (you could use a log scale on the second y-axis, or any other transformation you wished).

I suspect that the popularity of multiple y-axes arose from a greater need to save space in paper-based journals, but in the age of electronic documents, is space saving really that important?



Regards,
Richie.

Mathematical Sciences Unit
HSL

-- 
View this message in context: http://www.nabble.com/Thinking-about-using-two-y-scales-on-your-plot--tp16290293p16537217.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 07 Apr 2008 - 10:29:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Apr 2008 - 17:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive