Re: [R] Generation of missiing values in a time serie...

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Wed 14 Dec 2005 - 07:06:24 EST

In thinking about this some more, the trick I discussed is probably not the best way to do it since its possible that in the future zoo will completely disallow illegal zoo objects. I think a better way might be to construct it like this:

aggregate(zoo(z.data), round(z.time, 1), tail, 1)

where z.data is the matrix and z.time are the times. The variable z, which is an illegal zoo object, would not be created but in terms of z, since that is what I have reproducibly from your post, we have:

z.data <- coredata(z)
z.time <- time(z)

On 12/13/05, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:
> Yes, this is the definition of a time series and therefore of a zoo object.
> A time series is a mathematical function, i.e. it assigns a single element
> of the range to each element of the domain. This data does not describe
> a time series.
>
> Also it has no underlying regularity as the warning message states.
> To use as.ts one wants a series with an underlying regularity that has
> gaps and then as.ts will fill in the gaps with NAs.
>
> If we don't have an underlying regularity the question is not well posed
> but its likely we want to discretize time. The zoo command itself is
> somewhat forgiving, at least in this case, i.e. it allows one to specify
> this illegal zoo object with non-unique times for purposes of discretization;
> however, such a zoo object should not be used other than to get a legal
> zoo object out.
>
> For example, in the following we round the times to one decimal place
> and then within each set of values at the same discretized time take the
> last one. (Alternately specify mean instead of tail, 1 if the average
> is prefered.) Then we convert that to a ts object:
>
> > as.ts(aggregate(z, round(time(z), 1), tail, 1))
> Time Series:
> Start = c(123, 2)
> End = c(123, 8)
> Frequency = 10
> time flow seq ts x rtt size
> 123.1 123.1257 0 967 123.1257 13394 0.798205 1472
> 123.2 123.2411 0 969 123.2411 12680 0.796258 1472
> 123.3 NA NA NA NA NA NA NA
> 123.4 NA NA NA NA NA NA NA
> 123.5 123.4726 0 970 123.4726 12680 0.796258 1472
> 123.6 123.5886 0 971 123.5886 12680 0.796258 1472
> 123.7 123.7046 0 972 123.7046 12680 0.796258 1472
>
> On 12/13/05, Alvaro Saurin <saurin@dcs.gla.ac.uk> wrote:
> >
> > I think I have found the error. It appears when there are two entries
> > with the same time. Using as input file:
> >
> > --------- CUT --------
> > # Output format for PCKs:
> > # TIME FLOW P [+-] SEQ TS X RTT SIZE
> > #
> > 123.125683 0 P + 967 123.125683 13394 0.798205 1472
> > 123.241137 0 P + 968 123.241137 12680 0.796258 1472
> > 123.241137 0 P + 969 123.241137 12680 0.796258 1472
> > 123.472631 0 P + 970 123.472631 12680 0.796258 1472
> > 123.588613 0 P + 971 123.588613 12680 0.796258 1472
> > 123.704594 0 P + 972 123.704594 12680 0.796258 1472
> > --------- CUT --------
> >
> > I run fhe following code:
> >
> > --------- CUT --------
> > h_types <- list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
> > h_names <- list ("time", "flow", "seq", "ts", "x", "rtt", "size")
> >
> > pcks_file <- pipe ("grep ' P ' data", "r")
> > pcks <- scan (pcks_file, what = h_types, comment.char = '#',
> > fill = TRUE)
> > mat_df <- data.frame (pcks[1:2], pcks[5:9])
> > mat <- as.matrix (mat_df)
> > colnames (mat) <- h_names
> > z <- zoo (mat, mat [,"time"])
> > --------- CUT --------
> >
> > The dput of 'z' shows:
> >
> > --------- CUT --------
> > structure(c(123.125683, 123.241137, 123.241137, 123.472631, 123.588613,
> > 123.704594, 0, 0, 0, 0, 0, 0, 967, 968, 969, 970, 971, 972, 123.125683,
> > 123.241137, 123.241137, 123.472631, 123.588613, 123.704594, 13394,
> > 12680, 12680, 12680, 12680, 12680, 0.798205, 0.796258, 0.796258,
> > 0.796258, 0.796258, 0.796258, 1472, 1472, 1472, 1472, 1472, 1472
> > ), .Dim = c(6, 7), .Dimnames = list(c("1", "2", "3", "4", "5",
> > "6"), c("time", "flow", "seq", "ts", "x", "rtt", "size")), index =
> > structure(c(123.125683,
> > 123.241137, 123.241137, 123.472631, 123.588613, 123.704594), .Names =
> > c("1",
> > "2", "3", "4", "5", "6")), class = "zoo")
> > --------- CUT --------
> >
> > If I try a 'as.ts(z)', it fails. If I remove the duplicate entry, I
> > can convert it to a TS with no problem. Is this made intentionally?
> > Because then I have to filter the input matrix... But, anyway, the
> > output matrix, after filtering, doesn't seem regular:
> >
> > --------- CUT --------
> > > as.ts (z)
> > Time Series:
> > Start = 1
> > End = 5
> > Frequency = 1
> > time flow seq ts x rtt size
> > 1 123.1257 0 967 123.1257 13394 0.798205 1472
> > 2 123.2411 0 969 123.2411 12680 0.796258 1472
> > 3 123.4726 0 970 123.4726 12680 0.796258 1472
> > 4 123.5886 0 971 123.5886 12680 0.796258 1472
> > 5 123.7046 0 972 123.7046 12680 0.796258 1472
> > Warning message:
> > 'x' does not have an underlying regularity in: as.ts.zoo(z)
> > --------- CUT --------
> >
> > Weird...
> >
> >
> > On 13 Dec 2005, at 16:33, Gabor Grothendieck wrote:
> >
> > > Please provide a reproducible example. Note that dput(x) will output
> > > an R object in a way that can be copied and pasted into another
> > > session.
> > >
> > > On 12/13/05, Alvaro Saurin <saurin@dcs.gla.ac.uk> wrote:
> > >>
> > >> On 13 Dec 2005, at 13:08, Gabor Grothendieck wrote:
> > >>
> > >>> Your variable mat is not a matrix; its a data frame. Check it with:
> > >>>
> > >>> class(mat)
> > >>>
> > >>> Here is an example:
> > >>>
> > >>> x <- cbind(A = 1:4, B = 5:8)
> > >>> tt <- c(1, 3:4, 6)
> > >>>
> > >>> library(zoo)
> > >>> x.zoo <- zoo(x, tt)
> > >>> x.ts <- as.ts(x.zoo)
> > >>
> > >> Fixed, but anyway it fails:
> > >>
> > >>> h_types <- list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
> > >>> h_names <- list ("time", "flow", "seq", "ts", "x", "rtt",
> > >>> "size")
> > >>
> > >>> pcks_file <- pipe ("grep ' P ' server.dat", "r")
> > >>> pcks <- scan (pcks_file, what = h_types,
> > >> comment.char = '#', fill =
> > >> TRUE)
> > >>
> > >>> mat_df <- data.frame (pcks[1:2], pcks[5:9])
> > >>> mat <- as.matrix (mat_df)
> > >>> colnames (mat) <- h_names
> > >>
> > >>> class (mat)
> > >> [1] "matrix"
> > >>
> > >>> z <- zoo (mat, mat [,"time"])
> > >>
> > >>> z
> > >>> z
> > >> time flow seq ts
> > >> x rtt size
> > >> 1.0009 1.000893 0.000000 0.000000 1.000893
> > >> 1472.000000 0.000000 1472.000000
> > >> 1.5145 1.514454 0.000000 1.000000 1.514454
> > >> 2944.000000 0.513142 1472.000000
> > >> 2.0151 2.015093 0.000000 2.000000 2.015093
> > >> 2944.000000 0.513142 1472.000000
> > >> 2.515 2.515025 0.000000 3.000000 2.515025
> > >> 4806.000000 0.504488 1472.000000
> > >> 2.822 2.821976 0.000000 4.000000 2.821976
> > >> 5730.000000 0.496728 1472.000000
> > >> [...]
> > >>
> > >>> as.ts (z)
> > >> Error in if (del == 0 && to == 0) return(to) :
> > >> missing value where TRUE/FALSE needed
> > >>
> > >> Any idea? Thanks for your help.
> > >>
> > >> Alvaro
> > >>
> > >>
> > >> --
> > >> Alvaro Saurin <alvaro.saurin@gmail.com> <saurin@dcs.gla.ac.uk>
> > >>
> > >>
> > >>
> > >>
> >
> > --
> > Alvaro Saurin <alvaro.saurin@gmail.com> <saurin@dcs.gla.ac.uk>
> >
> >
> >
> >
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Dec 14 08:42:14 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:36 EST