Re: [Rd] boxplot by factor (Package base version 2.1.1) ( PR#7976)

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 28 Jun 2005 - 12:57:42 GMT

"Liaw, Andy" <andy_liaw@merck.com> writes:

> The issue is not with boxplot, but with split. boxplot.formula()
> calls boxplot(split(split(mf[[response]], mf[-response]), ...),
> but look at what split() returns when there are empty levels in
> the factor:
>
> > f <- factor(gl(3, 6), levels=1:5)
> > y <- rnorm(f)
> > split(y, f)
> $"1"
> [1] 0.4832124 1.1924811 0.3657797 1.7400198 0.5577356 0.9889520
>
> $"2"
> [1] -1.1296642 -0.4808355 -0.2789933 0.1220718 0.1287742 -0.7573801
>
> $"3"
> [1] 1.2320902 0.5090700 -1.5508074 2.1373780 1.1681297 -0.7151561
>
> The "culprit" is the following in split.default():
>
> f <- factor(f)
>
> which drops empty levels in f, if there are any. BTW, ?split doesn't
> mention what it does in such situation. Perhaps it should?
>
> If this is to be "fixed", I suppose an additional argument, e.g.,
> drop=TRUE, can be added, and the corresponding line mentioned
> above changed to something like:
>
> if (drop || !is.factor(f)) f <- factor(f)
>
> Then this additional argument can be pass on from boxplot.formula() to
> split().

Alternatively, I suspect that the intention was as.factor() rather than factor(). It does require a bit of care to fix it that way, though. There could be problems with empty levels popping up in unexpected places.

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue Jun 28 23:04:57 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:09 GMT