Re: [R] questions regarding stat_smooth in ggplot area plot

From: Werner Heijstek <w.heijstek_at_gmail.com>
Date: Fri, 25 Mar 2011 10:11:55 +0100

Hi Dennis,

Thanks a lot for your insights.

I 'solved' the negative smooth by not using an xlim() but an ylim(). If I may, I'll ask a third question: How to plot multiple of these ggplot area plots on top of one another so that the same x-axis is shared?

vp.layout <- function(x, y) viewport(layout.pos.row=x, layout.pos.col=y) arrange <- function(..., nrow=NULL, ncol=NULL, as.table=FALSE) {  dots <- list(...)
 n <- length(dots)

 if(is.null(nrow) & is.null(ncol)) { nrow = floor(n/2) ; ncol = ceiling(n/nrow)}
 if(is.null(nrow)) { nrow = ceiling(n/ncol)}
 if(is.null(ncol)) { ncol = ceiling(n/nrow)}
        ## NOTE see n2mfrow in grDevices for possible alternative
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow,ncol) ) )  ii.p <- 1
 for(ii.row in seq(1, nrow)){
 ii.table.row <- ii.row
 if(as.table) {ii.table.row <- nrow - ii.table.row + 1}   for(ii.col in seq(1, ncol)){
   ii.table <- ii.p
   if(ii.p > n) break
   print(dots[[ii.table]], vp=vp.layout(ii.table.row, ii.col))    ii.p <- ii.p + 1
  }
 }
}

set <- read.table(file="http://www.jovian.nl/set.csv", head=1, sep=",") set2 <- read.table(file="http://www.jovian.nl/set2.csv", head=1, sep=",") library(ggplot2)
s <- ggplot(set, aes(x = time, y = hours)) + geom_area(colour = 'red', fill = 'red', alpha = 0.5) +

     geom_area(stat = 'smooth', span = 0.2, alpha = 0.3) + ylim(0,40) s1 <- ggplot(set2, aes(x = time, y = hours)) + geom_area(colour = 'red', fill = 'red', alpha = 0.5) +

     geom_area(stat = 'smooth', span = 0.2, alpha = 0.3) + ylim(0,40) arrange(s,s1,ncol=1)

The arrange() function was taken from
http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html. In this example, the x-axes are only similar because the data sets have the same range. In effect, nothing more happens than that two images are plotted on top of one another. Now how to "merge" these two (and later more) area plots on top of each other so that they share the same x-axis (so that only one x-axis would be necessary on the bottom of the plot)?

Thanks,

Werner

On Thu, Mar 24, 2011 at 6:08 PM, Dennis Murphy <djmuser_at_gmail.com> wrote:
> Hi:
>
> On Thu, Mar 24, 2011 at 7:21 AM, jovian <w.heijstek_at_gmail.com> wrote:
>>
>> Hello,
>>
>> I drew a simple area plot using ggplot2 using
>>
>> set <- read.table(file="http://www.jovian.nl/set.csv", head=1,  sep=",")
>> library(ggplot2)
>> ggplot() +
>> layer(
>>  data = set, mapping = aes(x = time, y = hours),
>>  geom = "area", stat="smooth", color="red"
>> ) +
>> layer(
>>  data = set, mapping = aes(x = time, y = hours),
>>  geom = "area", color="red", fill="red", alpha="0.5"
>> )
>>
>> I have two questions about this visualisation:
>>
>> - The smooth function is too "rough" right now, how do I make it follow
>> the
>> original values more closely?
>
> On the contrary, the function is too smooth - if you want it to conform
> better to the observed data, you have to 'roughen' it, which means reducing
> the span argument (see stat_smooth for details). Higher spans (or
> equivalently, wider bandwidths) generate smoother curves.
>
>> - The smooth function turns up with negative values (e.g. at the
>> beginning):
>> How do I prevent this? (e.g. to use 0 instead of any negative value.)
>
> It appears that in ggplot2, loess fits a local quadratic function to the
> data within its span (essentially, the bandwidth of the x window that
> contains 100*span% of the data). The wider the bandwidth, the smoother the
> function. As the window moves from left to right, its width will change so
> that it contains 100*span% of the data. It does some other magic to smooth
> the individual local fits, but basically the degree of smoothness is a
> function of the span. Your times start at x = 27, but the first nonzero y
> (hours) doesn't occur until x = 40, so you could restrict the extent of
> x-values with the xlim() argument if you want to get rid of the visual
> anomaly on the left end of your plot. Here's one approach:
>
> set <- read.table(file="http://www.jovian.nl/set.csv", head=1,  sep=",")
> library(ggplot2)
> ggplot() +
> layer(
>  data = set, mapping = aes(x = time, y =hours),
>  geom = "area", stat="smooth", span = 0.3, color="black", alpha = 0.3
> ) +
> layer(
>  data = set, mapping = aes(x = time, y = hours),
>  geom = "area", color="red", fill="red", alpha = 0.5
> ) +
> xlim(40, 85)
>
> or
> ggplot(set, aes(x = time, y = hours)) + geom_area(colour = 'red', fill =
> 'red', alpha = 0.5) +
>      geom_area(stat = 'smooth', span = 0.3, alpha = 0.3) + xlim(40, 85)
>
> I toyed with some of the plot parameters - you could do the same to get what
> you want. The two primary changes are the introduction of the span parameter
> in the first layer, associated with stat_smooth(), and the use of xlim() at
> the end to restrict the extent of the x-values to be displayed. You will get
> a warning about 'Removed 13 rows containing missing values', but those
> values are the times from 27-39 where hours = 0. If you need to have those
> times in the plot, then you'll have to live with the curve output by
> stat_smooth, even if it dips below zero. This is a consequence of the local
> quadratic fit. It's possible to get local linear fits (IIRC, that comes with
> degree = 1 in loess()), but I'll let you play with that if you so wish.
>
> stat_smooth() also has a n = argument; if you want the smooth to be
> generated over a fixed number of points rather than a fixed percentage of
> points, you could use that in place of span.
>
> HTH,
> Dennis
>
>> Thanks,
>>
>> Werner
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/questions-regarding-stat-smooth-in-ggplot-area-plot-tp3402632p3402632.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 Mar 2011 - 13:07:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 Mar 2011 - 13:10:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive