Re: [R] Stacked Histogram, multiple lines for dates of news stories?

From: Jim Lemon <jim_at_bitwrit.com.au>
Date: Tue, 29 Jun 2010 19:03:18 +1000

On 06/29/2010 01:04 AM, Simon Kiss wrote:
> Dear colleagues,
> I have extracted the dates of several news stories from a newspaper data
> base to chart coverage trends of an issue over time. They are in a data
> frame that looks just like one generated by the reproducible code below.
> I can already generate a histogram of the dates with various intervals
> (months, quarters, weeks years) using hist.Date. However, there are two
> other things I'd like to do.
> First, I'd like to either create a stacked histogram so that one could
> see whether one newspaper really pushed coverage of an issue at a
> certain point while others then followed later on in time. Second, or
> alternatively, I would like to do a line graph of the same data for the
> different papers to represent the same trends.
> I guess what I'm finding challenging is that I don't have counts of the
> number of stories on each day or in each week or in each month; I just
> have the dates themselves. The date.Hist command was very useful in
> turning those into bins, but I'd like to push it a bit further and to a
> stacked histogram or a multiple line chart.
> Can anyone suggest a way to go about doing this?
>
> I should say, I played around in Hadley Wickham's ggplot package and
> looked at his website, and there is a way to render multiple lines here:
> http://had.co.nz/ggplot2/scale_date.html
> but it was not clear to me how to plot just the dates or an index of the
> dates as I don't have a value for the y axis, other than the number of
> times a story was published in that time frame.
>
Hi Simon,
I had to think about this for a while, but the following may be what you want. It also gave me an idea for a new plot. Thanks.

Jim

library(plotrix)
count1<-
  hist(as.numeric(test_df$test2[test_df$test=="Globe and Mail"]),   breaks=6)$counts
count2<-
  hist(as.numeric(test_df$test2[test_df$test=="Post"]),   breaks=6)$counts
count3<-
  hist(as.numeric(test_df$test2[test_df$test=="Star"]),   breaks=6)$counts
plot(test_df$test2,test_df$test,ylim=c(0.4,3.6),type="n",   main="Date of articles",xlab="Year",ylab="Journal",axes=FALSE) yearpos<-seq(12599,14425,length.out=6)
axis(1,at=yearpos,labels=2004:2009)
axis(2,at=1:3,labels=c("Globe and Mail","Post","Star")) box()
dispersion(yearpos,rep(1,6),count1/(max(count1)*2),   type="l",fill="green")
dispersion(yearpos,rep(2,6),count2/(max(count2)*2),   type="l",fill="red")
dispersion(yearpos[1:5],rep(3,5),count3/(max(count3)*2),   type="l",fill="blue")



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 29 Jun 2010 - 09:01:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 29 Jun 2010 - 09:40:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive