Re: [R] Aggregating zoo object with NAs in multiple column

From: Achim Zeileis <Achim.Zeileis_at_wu-wien.ac.at>
Date: Thu, 24 Jul 2008 02:26:39 +0200 (CEST)

On Wed, 23 Jul 2008, Abiel Reinhart wrote:

> I would like to run an aggregation on a zoo object that has multiple series
> in it, with one of more series having NA values. The problem is that by
> default the aggregate function will produce an NA value in each aggregated
> period that contains an NA. For instance, if I run aggregate(x,
> as.yearmon(index(x)), mean) on the example object "x" which is printed
> below, I will just get a bunch of NAs for January.

This is not specific to zoo series, the function mean() always behaves like this. If you want to remove the NAs before, you have to pass the argument na.rm = TRUE to mean. The easiest way to do this is

   aggregate(x, as.yearmon, mean, na.rm = TRUE) Z

> This behavior is perfectly logical. The problem is that if I try to use the
> na.omit() function, it will throw away the entire line if even one series
> has an NA value. For example, in the table below, you can see that running
> na.omit() will throw out periods 2001-01-06 through 2001-01-10. But since
> each of these lines contain many non-NA readings, we are throwing away real
> information that should be used in the calculation of the means for January.
> The mean for column B should include non-NA value for the month, but since A
> has a NA value on January 6, the January 6 value for B will be dropped as
> well. Same thing for columns C, D, and E.
>
> I suppose one solution would be to break the object into five one-series
> objects, run aggregate(na.omit(item), as.yearmon(index(na.omit(item))),
> mean) on each of them, then bind them back together, but this is rather
> annoying. Is there a better way?
>
> Thanks.
>
> Abiel
>
> a b c d e
> 2001-01-01 0.5183099 0.62792449 0.90859932 0.56578026 0.3991120
> 2001-01-02 0.2759420 0.96788392 0.30789409 0.76159986 0.3122280
> 2001-01-03 0.3263367 0.41224859 0.69756281 0.27406235 0.6902459
> 2001-01-04 0.3681782 0.41167564 0.02734471 0.39348676 0.8370692
> 2001-01-05 0.2550825 0.65790206 0.65134885 0.92537263 0.4143775
> 2001-01-06 NA 0.09076128 0.35209944 0.70821994 0.6659275
> 2001-01-07 0.4749008 NA 0.73579892 0.67311239 0.2155689
> 2001-01-08 0.7314498 0.56542607 NA 0.37529408 0.9313593
> 2001-01-09 0.5560702 0.47944318 0.01946189 NA 0.7055763
> 2001-01-10 0.4848510 0.12003527 0.31297935 0.41487588 NA
> 2001-01-11 0.0902985 0.88107285 0.33374604 0.26173483 0.3062338
> 2001-01-12 0.3664127 0.35366508 0.97760256 0.90784835 0.7399498
> 2001-01-13 0.6394206 0.05157520 0.38823937 0.92289256 0.6464278
> 2001-01-14 0.1949957 0.29738760 0.25224214 0.00024017 0.1228440
> 2001-01-15 0.7723980 0.99391775 0.22869908 0.97916413 0.1066641
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 24 Jul 2008 - 00:30:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 24 Jul 2008 - 00:32:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive