From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Tue 28 Jun 2005 - 12:37:56 GMT

The issue is not with boxplot, but with split. boxplot.formula()
calls boxplot(split(split(mf[[response]], mf[-response]), ...),
but look at what split() returns when there are empty levels in
the factor:

$"2"

[1] -1.1296642 -0.4808355 -0.2789933 0.1220718 0.1287742 -0.7573801

$"3"

[1] 1.2320902 0.5090700 -1.5508074 2.1373780 1.1681297 -0.7151561

The "culprit" is the following in split.default():

If this is to be "fixed", I suppose an additional argument, e.g., drop=TRUE, can be added, and the corresponding line mentioned above changed to something like:

if (drop || !is.factor(f)) f <- factor(f)

Then this additional argument can be pass on from boxplot.formula() to split().

Just my $0.02...

Andy

From: mwtoews@sfu.ca
**>
*

I consider this to be an old bug, which also persists in Splus 7. It

is unnecessary, and annoying.
**>
## Section 1: Consider a simple data frame with three possible
factors (in levels)
**> factors (in levels)
**>
d <- data.frame(a=sort(rnorm(10)*10), b=factor(c(rep("A",4), rep("C",
6)), levels=c("A","B","C")))
**> 6)), levels=c("A","B","C")))
plot(a ~ b, d) # plots only two of three objects, ignoring
that there
**> plot(a ~ b, d) # plots only two of three objects, ignoring
**> that there
**> was "C" in the second position
**>
# if I tried to plot a blank in between the two boxplots:
plot(a ~ b, d, at=1:3) # nope: error
**> plot(a ~ b, d, at=c(1,3)) # nope: out of range (also xlim does
**> nothing for the formula boxplot method)
**>
# to make this work with the current R/Splus implementation, I have
to add a zero:
**> to add a zero:
d <- rbind(d, data.frame(a=0,b="B")) # which I don't want to do,
since there are no "B"
**> since there are no "B"
plot(a ~ b, d) # yuk!
**>
**> ## Section 2: Why is this important? Consider another realistic
**> example of [synthetic] daily temperature
**>
temp <- 5 - 10*cos(1:365*2*pi/365) + rnorm(365)*3
d1 <- data.frame(year=2005, jday=1:365, date=NA, month=NA, temp) #
jday is Julian day [1,365]
**> jday is Julian day [1,365]
d1$date <- as.Date(paste(d1$year, d1$jday), "%Y %j")
d1$month <- factor(months(d1$date,TRUE), levels=month.abb)
plot(temp ~ month, d1) # perfect, in a perfect meteorological world
**>
d2 <- d1[!d1$month %in% c("Mar","Apr","May","Sep"),] # now let's
remove some data
**> remove some data
tapply(d2$temp,d2$month,mean) # perfect
plot(temp ~ month, d2) # ugly, not 12 months, etc. (despite
having 12
levels)
**> having 12
**> levels)
**>
# again the only cure is to add zeros to the missing months
(unnecessary forgery of data)
**> (unnecessary forgery of data)
d3 <- d2
for (i in c("Mar","Apr","May","Sep")) {
d3 <- rbind(d3,NA)
d3$month[nrow(d3)] <- i
d3$temp[nrow(d3)] <- 0
}
plot(temp ~ month, d3) # still ugly, but at least has 12 months!
**>
## Section 3: Solution
**> The obvious solution is to leave a blank where a boxplot should go,
**> similar to tapply. This would have 1:n positions, where n is the
**> number of levels of the factor, not the number of factors that have
**> one or more numbers. The position should also have a label
**> under the
**> tick mark.
**> I don't see any reason why the missing data should be completely
**> ignored. Users wishing to not plot the blanks where the data
**> could go
**> can simply type (for back-compatibility):
**>
d2$month <- factor(d2$month) # from 12 to 8 levels
**>
Which will produce the same 8-factor plot as above.
**>
## Section 4: Conclusion
**> I consider this to be a bug in regards to data representation, and
**> this function is not consistant with other functions like `tapply'.
**> Considering that the back-compatibility solution is very simple, and
**> most users would probably prefer a result including all levels (NULL
**> or real values in each), I feel this an appropriate improvement (and
**> easy to fix in the code). At the very least, include an option to
**> honour the factor levels.
**>
Thanks.
-mt
**>
**> --please do not edit the information below--
**>
**> Version:
**> platform = powerpc-apple-darwin8.1.0
**> arch = powerpc
**> os = darwin8.1.0
**> system = powerpc, darwin8.1.0
**> status = Patched
**> major = 2
**> minor = 1.1
**> year = 2005
**> month = 06
**> day = 26
**> language = R
**>
**> Locale:
**> en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
**>
**> Search Path:
**> .GlobalEnv, package:methods, package:stats, package:graphics,
**> package:grDevices, package:utils, package:datasets, Autoloads,
**> package:base
**>
