Re: [R] Another quantmod question

From: Russ Abbott <russ.abbott_at_gmail.com>
Date: Sun, 08 May 2011 21:54:39 -0700

Jeff,

Clearly you (and others) have put a lot of work into xts -- and I'm the beneficiary. So I'll stop complaining.

Thanks for the class (both code and explanation).

*-- Russ *

On Sun, May 8, 2011 at 8:23 PM, Jeff Ryan <jeff.a.ryan_at_gmail.com> wrote:

> Hi Russ,
>
> We're of course getting into some incredibly fine-level detail on how all
> of this works. I'll try and explain issues as I recall them over the
> development of xts and cbind.xts
>
> xts started as an extension of zoo. zoo is an extension of 'ts' (greatly
> simplified comparison of course, but stay with me)
>
> Achim and Gabor have put tremendous effort into the design of zoo - with a
> primary focus on keeping it consistent with base R behavior. That is, try
> not to introduce unnecessary changes to the interface an R user is
> accustomed to. The logic being that this makes for a more consistent
> interface as well as a easier learning curve and hence greater/faster
> adoption rate.
>
> 'xts' extends this, though with a bit more flexibility in terms of
> consistency. Why? Simply put - some things about R annoyed me coming from a
> time-series background. Number one was the fact that lag() is backwards.
> Backwards from expectation, nearly all literature, and all standard
> definitions. So xts breaks with lag(, n=1) behavior. This is obviously
> confusing to some - but was the gamble I was willing to take - consistency
> (with R) be damned! ;-)
>
> So, now back to cbind. cbind and merge in zoo-land (and xts by extension)
> are the same. This isn't the case for other classes that use these - but
> that is 'allowable' and 'expected' under a class dispatch system. The docs
> for ?cbind state:
>
> For ‘cbind’ (‘rbind’) the column (row) names are taken from the
> ‘colnames’ (‘rownames’) of the arguments if these are matrix-like.
> Otherwise from the names of the arguments or where those are not
> supplied and ‘deparse.level > 0’, by deparsing the expressions
> given, for ‘deparse.level = 1’ only if that gives a sensible name
> (a ‘symbol’, see ‘is.symbol’).
>
> Based on that, I'd argue that xts does it "right". Of course I'll also
> point out that this is incorrect thinking as well - since this is a
> description for the generic - and not for xts. But again in a highly
> configurable object/class system, where you start to make a distinction of
> right and wrong is itself up for debate.
>
> At the other end of the argument spectrum is _why not_. That is, why can't
> cbind.xts handle the names to replace the colnames of objects passed in.
> Here is where I'll point out that I am really just going by memory.
>
> Three major items are involved in cbind. One is that dispatch is quite
> unlike nearly every other dispatch in R. This is a fact - nothing to do
> with xts.
>
> * cbind isn't a generic (it's an .Internal call)
> * it uses ...
> * cbind can be called in numerous ways (I'll list only the common ones -
> but with R you can do even crazier things)
>
> do.call(cbind,
> do.call(cbind.xts,
> cbind,
> cbind.xts,
> merge,
> merge.xts,
> do.call(merge,
> do.call(merge.xts
>
> The rules of dispatch on cbind are really at a level that R-help has no
> business discussing. The second part is where things actually get tricky
> though. They all behave differently with respect to how args are handled -
> when eval'd, etc.
>
> I'm sure you have read how R strains itself on 'big data'. This is true
> and false. Improper use (or just naive use) can cause object copies in
> places you really don't want. Much of xts at this point is implemented in
> custom C code. The gain here is that you can make it eas(ier) to avoid
> copies until you need them by writing in C. Obvious, but needs to be said.
>
> To figure out what the columns have - and if names are attached to the
> objects in the pairlist (the "..." in this context) - you have to be very
> careful. Touch anything in the wrong place or wrong time and you lose a
> figurative arm and leg to memory copies. So, in 99.9999% of cases - where
> you aren't naming (which would be an extra feature above and beyond c(olumn)
> binding [the reason for cbind] - you run a very real risk of getting nailed
> for copies you don't want. On 10MM obs that is almost manageable. On 100's
> of millions or billions - it is kill -9 time.
>
> To compound the issue - recall all of those different dispatch methods.
> Yep - they all behave just a bit differently. How? Honestly - I don't
> know or care. I simply know you can't easily make the behavior consistent
> amongst those calls. I have tried. And tried.
>
> End of day, and a very long R-help email, xts is different than base R. It
> is even different than it's 'parent' zoo behavior. But in exchange for this
> difference (and bit of learning/adjustment) you get a class that is faster
> than anything else.
>
> Period.
>
> > x <- .xts(1:1e7, 1:1e7) # our time series object
> > m <- coredata(x) # a matrix
>
> > str(x)
> An ‘xts’ object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:
> Data: int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...
> Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago
> xts Attributes:
> NULL
>
> > str(m)
> int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...
>
> > system.time(x[,1]) # get the first column
> user system elapsed
> 0.017 0.000 0.017
> > system.time(m[,1]) # ditto
> user system elapsed
> 0.152 0.000 0.153
>
> Yep, nearly 10x faster than a matrix op - AND you still have the time
> index. To get there you need to sometimes make sacrifices. xts does, though
> I like to think they are well thought out and consistent*
>
> *enough ;-)
>
> Best,
> Jeff
>
>
> On Sun, May 8, 2011 at 8:57 PM, Joshua Ulrich <josh.m.ulrich_at_gmail.com>wrote:
>
>> Russ,
>>
>> On May 8, 2011 6:29 PM, "Russ Abbott" <russ.abbott_at_gmail.com> wrote:
>> >
>> > Hi Jeff,
>> >
>> > The xts class has some very nice features, and you have done a valuable
>> > service in developing it.
>> >
>> > My primary frustration is how difficult it seems to be to find out what
>> went
>> > wrong when my code doesn't work. I've been writing quite sophisticated
>> code
>> > for a fairly long time. It's not that I'm new to software development.
>> >
>> > The column name rule is a good example. I'm willing to live with the
>> rule
>> > that column names are not changed for efficiency sake. What's difficult
>> for
>> > me is that I never saw that rule anywhere before. Of course, I'm not an
>> R
>> > expect. I've been using it for only a couple of months. But still, I
>> would
>> > have expected to run into a rule like that.
>> >
>> > Worse, since the rule is in conflict with the explicit intent of
>> cbind--one
>> > can name columns when using cbind; in fact the examples illustrate how
>> to do
>> > it--it would really be nice of cbind would issue a warning when one
>> attempts
>> > to rename a column in violation of that rule. Instead, cbind is silent,
>> > giving no hint about what went wrong.
>> >
>> Naming columns is not the explicit intent of cbind. The explicit
>> intent is to combine objects by columns. Please don't overstate the
>> case.
>>
>> While the examples for the generic show naming columns, neither
>> ?cbind.zoo or ?cbind.xts have such examples. That's a hint.
>>
>> > It's those sorts of things that have caused me much frustration. And
>> it's
>> > these sorts of things that seem pervasive in R. One never knows what
>> one is
>> > dealing with. Did something not work because there is a special case
>> rule
>> > that I haven't heard of? Did it not work because a special convenience
>> was
>> > programmed into a function in a way that conflicted with normal use?
>> Since
>> > these sorts of things seem to come up so often, I find myself feeling
>> that
>> > there is no good way to track down problems, which leads to a sense of
>> > helplessness and confusion. That's not what one wants in a programming
>> > language.
>> >
>> If that's not what one wants, one can always write their own
>> programming language.
>>
>> Seriously, it seems like you want to rant more than understand what's
>> going on. You have the R and xts help pages and the source code. The
>> "Note" section of help(cbind) tells you that the method dispatch is
>> different. It even tells you what R source file to look at to see how
>> dispatching is done. Compare the relevant source files from
>> base::cbind and xts::cbind.xts, look at the "R Language Definition"
>> manual to see how method dispatch is normally done.
>>
>> But you've been writing quite sophisticated code for a fairly long
>> time, so I'm not telling you anything you don't know... you just don't
>> think you should have to do the legwork.
>>
>> > -- Russ
>> >
>> >
>>
>> --
>> Joshua Ulrich | FOSS Trading: www.fosstrading.com
>>
>>
>>
>> > On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan <jeff.a.ryan_at_gmail.com>
>> wrote:
>> >
>> > > Hi Russ,
>> > >
>> > > Colnames don't get rewritten if they already exist. The reason is due
>> to
>> > > performance and how cbind is written at the R level.
>> > >
>> > > It isn't perfect per se, but the complexity and variety of dispatch
>> that
>> > > can take place for cbind in R, as it isn't a generic, is quite
>> challenging
>> > > to get to behave as one may hope. After years of trying I'd say it is
>> > > nearly impossible to do what you want without causing horrible memory
>> issues
>> > > on non trivial objects they are use in production systems **using**
>> xts on
>> > > objects with billions of rows. Your simple case that has a simple
>> > > workaround would cost everyone using in the other 99.999% of cases to
>> pay a
>> > > recurring cost that isn't tolerable.
>> > >
>> > > If this is frustrating to you you should stop using the class.
>> > >
>> > > Jeff
>> > >
>> > > Jeffrey Ryan | Founder | <jeffrey.ryan_at_lemnica.com>
>> > > jeffrey.ryan_at_lemnica.com
>> > >
>> > > www.lemnica.com
>> > >
>> > > On May 8, 2011, at 2:07 PM, Russ Abbott <russ.abbott_at_gmail.com>
>> wrote:
>> > >
>> > > I'm having troubles with the names of columns.
>> > >
>> > > quantmod deal with stock quotes. I've created an array of the first 5
>> > > closing prices from Jan 2007. (Is there a problem that the name is the
>> same
>> > > as the variable name? There shouldn't be.)
>> > >
>> > > > close
>> > >
>> > > close
>> > >
>> > > 2007-01-03 1416.60
>> > >
>> > > 2007-01-04 1418.34
>> > >
>> > > 2007-01-05 1409.71
>> > >
>> > > 2007-01-08 1412.84
>> > >
>> > > 2007-01-09 1412.11
>> > >
>> > >
>> > > When I try to create a more complex array by adding columns, the names
>> get
>> > > fouled up. Here's a simple example.
>> > >
>> > > > cbind(changed.close = close+1, zero = 0, close)
>> > >
>> > > close zero close.1
>> > >
>> > > 2007-01-03 1417.60 0 1416.60
>> > >
>> > > 2007-01-04 1419.34 0 1418.34
>> > >
>> > > 2007-01-05 1410.71 0 1409.71
>> > >
>> > > 2007-01-08 1413.84 0 1412.84
>> > >
>> > > 2007-01-09 1413.11 0 1412.11
>> > >
>> > >
>> > > The first column should be called "changed.close", but it's called
>> "close".
>> > > The second column has the right name. The third column should be
>> called
>> > > "close" but it's called "close.1". Why is that? Am I missing
>> something?
>> > >
>> > > If I change the order of the columns and let close have its original
>> name,
>> > > there is still a problem.
>> > >
>> > > > cbind(close, zero = 0, changed.close = close+1)
>> > >
>> > > close zero close.1
>> > >
>> > > 2007-01-03 1416.60 0 1417.60
>> > >
>> > > 2007-01-04 1418.34 0 1419.34
>> > >
>> > > 2007-01-05 1409.71 0 1410.71
>> > >
>> > > 2007-01-08 1412.84 0 1413.84
>> > >
>> > > 2007-01-09 1412.11 0 1413.11
>> > >
>> > >
>> > > Now the names on the first two columns are ok, but the third column is
>> > > still wrong. Again, why is that? Apparently it's not letting me
>> assign a
>> > > name to a column that comes from something that already has a name.
>> Is that
>> > > the way it should be?
>> > >
>> > > I don't get that same problem on a simpler example.
>> > >
>> > >
>> > > > IX <- cbind(I=0, X=(1:3))
>> > >
>> > > > IX
>> > >
>> > > I X
>> > >
>> > > [1,] 0 1
>> > >
>> > > [2,] 0 2
>> > >
>> > > [3,] 0 3
>> > >
>> > > > cbind(Y = 1, Z = IX[, "I"], W = IX[, "X"])
>> > >
>> > > Y Z W
>> > >
>> > > [1,] 1 0 1
>> > >
>> > > [2,] 1 0 2
>> > >
>> > > [3,] 1 0 3
>> > >
>> > >
>> > > Is this a peculiarity to xts objects?
>> > >
>> > > Thanks.
>> > >
>> > > *-- Russ *
>> > > *
>> > > *
>> > > P.S. Once again I feel frustrated because it's taken me far more time
>> than
>> > > it deserves to track down and characterize this problem. I can fix it
>> by
>> > > using the names function. But I shouldn't have to do that.
>> > >
>> > >
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help_at_r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jeffrey Ryan
>
> jeffrey.ryan_at_lemnica.com
>
> www.lemnica.com
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 09 May 2011 - 06:16:21 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 09 May 2011 - 07:10:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive