[SPAM Detected: 88%] Re: [R] R in the NY Times

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Wed, 07 Jan 2009 14:07:51 -0600

on 01/07/2009 09:29 AM Max Kuhn wrote:

>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."

>
> I think that I actually made this statement about both the SAS and
> Splus traffic...
>
> I wasn't really trying to be critical of SAS. I was trying to get
> across that SAS focused their resources on features that had nothing
> to do with *statistical analysis* (e.g. data warehousing etc.)

Presuming that the Google Groups archive of SAS-L is reasonably complete:

 http://groups.google.com/group/comp.soft-sys.sas/about

The monthly posting frequency data since 1993 is:

Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,

1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L, 2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L, 476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L, 2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L, 1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L, NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L, 1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),

    Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,

    1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
    ), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
    1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
    ), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
    1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
    )), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame", row.names = c("1993",
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009" ))

> Posts

      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1993 NA NA NA NA NA NA 15 458 330 219 472 517 1994 546 511 658 681 712 751 763 975 703 805 752 666 1995 548 734 963 792 945 1002 775 969 745 691 773 765 1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844 1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677 1998 894 1068 945 784 448 813 896 823 894 1129 733 492 1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595 2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298 2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424 2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520 2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445 2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148 2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524 2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374 2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948 2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921 2009 357 NA NA NA NA NA NA NA NA NA NA NA

One can then review the annual posting frequency via:

pdf("SAS-L.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),

              beside = TRUE,
              cex.names = 0.6, main = "SAS-L Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()

There would appear to be marked increases in 2000 and again in 2006. However, it has been flat for the past 3 calendar years. No decline yet, but it will happen in due course...

No comparable posting data table exists for S-News as far as I can find, so I wrote a quick program to read the S-News archive pages here:

  http://www.biostat.wustl.edu/archives/html/s-news/

and get monthly posting counts, using the 'Thread' based html pages, where each monthly embedded post link has a URL of the form:

http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html

Thus, the program I used is:

TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-") Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
  URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",

               TD[i], "/threads.html", sep = "")

  cat(URL, "\n")

  if (!inherits(try(con <- readLines(URL)), "try-error"))   {
    Posts[i] <- length(grep("msg.*\\.html", con))     rm(con)
  } else {
    Posts[i] <- NA
  }
}

Posts <- matrix(Posts, ncol = 12, byrow = TRUE) rownames(Posts) <- 1998:2009
colnames(Posts) <- month.abb

That gives you:

Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,

5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27, NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L, 12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))



> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998 NA 273 378 293 330 243 219 209 191 241 181 141 1999 210 173 313 300 334 254 284 270 300 253 300 194 2000 264 313 285 264 306 247 245 302 204 251 261 176 2001 246 232 252 300 331 282 258 260 260 229 232 194 2002 230 255 242 228 219 248 230 207 221 280 228 177 2003 189 179 218 196 189 217 221 187 186 295 197 142 2004 197 230 257 151 164 175 154 187 195 150 176 176 2005 174 161 193 182 174 109 159 144 107 98 82 84 2006 109 87 99 123 107 96 84 97 68 73 53 20 2007 51 59 74 48 46 34 47 39 35 70 56 41 2008 48 63 58 47 31 27 40 28 41 30 27 36 2009 5 NA NA NA NA NA NA NA NA NA NA NA

Which can then be graphed by:

pdf("S-News.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),

              beside = TRUE,
              cex.names = 0.6, main = "S-News Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()

The consistent decline in posting frequency since 1999 is notable. The temporal association with the introduction of R is perhaps profound.

As long as I am on the subject, I figured that I would do the same for R-Help. The downside is that readLines() (really url() ) does not support https:, so I took a somewhat different approach, using wget:

TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-") Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
  URL <- paste("https://stat.ethz.ch/pipermail/r-help/",

               TD[i], "/thread.html", sep = "")

  cat(URL, "\n")

  CMD <- paste("wget", URL)
  system(CMD)

  if (file.exists("thread.html"))
  {
    con <- readLines("thread.html")
    Posts[i] <- length(grep("[0-9]+\\.html", con))     rm(con)
    unlink("thread.html")
  } else {
    Posts[i] <- NA
  }
}

Posts <- matrix(Posts, ncol = 12, byrow = TRUE) rownames(Posts) <- 1997:2009
colnames(Posts) <- month.abb

This gives you:

Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,

2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94, 203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76, 96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037, NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594, 2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

> Posts

      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1997 NA NA NA 92 36 47 41 37 40 76 61 57 1998 135 79 114 101 90 105 110 64 94 96 184 105 1999 226 145 195 189 161 186 184 148 203 231 318 221 2000 205 355 377 377 504 418 293 356 434 418 433 422 2001 558 583 651 470 552 550 615 562 678 657 825 530 2002 884 697 880 965 1057 926 918 824 705 1055 1038 742 2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158 2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481 2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508 2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450 2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028 2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399 2009 462 NA NA NA NA NA NA NA NA NA NA NA

Which again can be graphed as:

pdf("R-Help.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),

              beside = TRUE,
              cex.names = 0.6, main = "R-Help Traffic",
              cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
      line = 2, cex = 0.5)

dev.off()

Now....there's a healthy growth curve.... :-)

Note that the annual traffic volume for 2008 on R-Help exceeds that on SAS-L. For convenience, I am attaching each of the 3 plots.

Regards,

Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Wed 07 Jan 2009 - 20:11:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 08 Jan 2009 - 00:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive