Re: [R] Another quantmod question

From: Jeff Ryan <jeff.a.ryan_at_gmail.com>
Date: Sun, 08 May 2011 22:23:09 -0500

Hi Russ,

We're of course getting into some incredibly fine-level detail on how all of this works. I'll try and explain issues as I recall them over the development of xts and cbind.xts

xts started as an extension of zoo. zoo is an extension of 'ts' (greatly simplified comparison of course, but stay with me)

Achim and Gabor have put tremendous effort into the design of zoo - with a primary focus on keeping it consistent with base R behavior. That is, try not to introduce unnecessary changes to the interface an R user is accustomed to. The logic being that this makes for a more consistent interface as well as a easier learning curve and hence greater/faster adoption rate.

'xts' extends this, though with a bit more flexibility in terms of consistency. Why? Simply put - some things about R annoyed me coming from a time-series background. Number one was the fact that lag() is backwards.  Backwards from expectation, nearly all literature, and all standard definitions. So xts breaks with lag(, n=1) behavior. This is obviously confusing to some - but was the gamble I was willing to take - consistency (with R) be damned! ;-)

So, now back to cbind. cbind and merge in zoo-land (and xts by extension) are the same. This isn't the case for other classes that use these - but that is 'allowable' and 'expected' under a class dispatch system. The docs for ?cbind state:

For ‘cbind’ (‘rbind’) the column (row) names are taken from the

     ‘colnames’ (‘rownames’) of the arguments if these are matrix-like.
     Otherwise from the names of the arguments or where those are not
     supplied and ‘deparse.level > 0’, by deparsing the expressions
     given, for ‘deparse.level = 1’ only if that gives a sensible name
     (a ‘symbol’, see ‘is.symbol’).

Based on that, I'd argue that xts does it "right". Of course I'll also point out that this is incorrect thinking as well - since this is a description for the generic - and not for xts. But again in a highly configurable object/class system, where you start to make a distinction of right and wrong is itself up for debate.

At the other end of the argument spectrum is _why not_. That is, why can't cbind.xts handle the names to replace the colnames of objects passed in.  Here is where I'll point out that I am really just going by memory.

Three major items are involved in cbind. One is that dispatch is quite unlike nearly every other dispatch in R. This is a fact - nothing to do with xts.

   do.call(cbind,
   do.call(cbind.xts,
   cbind,
   cbind.xts,
   merge,
   merge.xts,
   do.call(merge,
   do.call(merge.xts

The rules of dispatch on cbind are really at a level that R-help has no business discussing. The second part is where things actually get tricky though. They all behave differently with respect to how args are handled - when eval'd, etc.

I'm sure you have read how R strains itself on 'big data'. This is true and false. Improper use (or just naive use) can cause object copies in places you really don't want. Much of xts at this point is implemented in custom C code. The gain here is that you can make it eas(ier) to avoid copies until you need them by writing in C. Obvious, but needs to be said.

To figure out what the columns have - and if names are attached to the objects in the pairlist (the "..." in this context) - you have to be very careful. Touch anything in the wrong place or wrong time and you lose a figurative arm and leg to memory copies. So, in 99.9999% of cases - where you aren't naming (which would be an extra feature above and beyond c(olumn) binding [the reason for cbind] - you run a very real risk of getting nailed for copies you don't want. On 10MM obs that is almost manageable. On 100's of millions or billions - it is kill -9 time.

To compound the issue - recall all of those different dispatch methods. Yep - they all behave just a bit differently. How? Honestly - I don't know or care. I simply know you can't easily make the behavior consistent amongst those calls. I have tried. And tried.

End of day, and a very long R-help email, xts is different than base R. It is even different than it's 'parent' zoo behavior. But in exchange for this difference (and bit of learning/adjustment) you get a class that is faster than anything else.

Period.

> x <- .xts(1:1e7, 1:1e7) # our time series object
> m <- coredata(x) # a matrix

> str(x)

An ‘xts’ object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:   Data: int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...   Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago   xts Attributes:
 NULL
> str(m)

 int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...

> system.time(x[,1]) # get the first column

   user system elapsed
  0.017 0.000 0.017
> system.time(m[,1]) # ditto

   user system elapsed
  0.152 0.000 0.153

Yep, nearly 10x faster than a matrix op - AND you still have the time index. To get there you need to sometimes make sacrifices. xts does, though I like to think they are well thought out and consistent*

*enough ;-)

Best,
Jeff

On Sun, May 8, 2011 at 8:57 PM, Joshua Ulrich <josh.m.ulrich_at_gmail.com>wrote:

> Russ,
>
> On May 8, 2011 6:29 PM, "Russ Abbott" <russ.abbott_at_gmail.com> wrote:
> >
> > Hi Jeff,
> >
> > The xts class has some very nice features, and you have done a valuable
> > service in developing it.
> >
> > My primary frustration is how difficult it seems to be to find out what
> went
> > wrong when my code doesn't work. I've been writing quite sophisticated
> code
> > for a fairly long time. It's not that I'm new to software development.
> >
> > The column name rule is a good example. I'm willing to live with the
> rule
> > that column names are not changed for efficiency sake. What's difficult
> for
> > me is that I never saw that rule anywhere before. Of course, I'm not an
> R
> > expect. I've been using it for only a couple of months. But still, I
> would
> > have expected to run into a rule like that.
> >
> > Worse, since the rule is in conflict with the explicit intent of
> cbind--one
> > can name columns when using cbind; in fact the examples illustrate how to
> do
> > it--it would really be nice of cbind would issue a warning when one
> attempts
> > to rename a column in violation of that rule. Instead, cbind is silent,
> > giving no hint about what went wrong.
> >
> Naming columns is not the explicit intent of cbind. The explicit
> intent is to combine objects by columns. Please don't overstate the
> case.
>
> While the examples for the generic show naming columns, neither
> ?cbind.zoo or ?cbind.xts have such examples. That's a hint.
>
> > It's those sorts of things that have caused me much frustration. And it's
> > these sorts of things that seem pervasive in R. One never knows what one
> is
> > dealing with. Did something not work because there is a special case rule
> > that I haven't heard of? Did it not work because a special convenience
> was
> > programmed into a function in a way that conflicted with normal use?
> Since
> > these sorts of things seem to come up so often, I find myself feeling
> that
> > there is no good way to track down problems, which leads to a sense of
> > helplessness and confusion. That's not what one wants in a programming
> > language.
> >
> If that's not what one wants, one can always write their own
> programming language.
>
> Seriously, it seems like you want to rant more than understand what's
> going on. You have the R and xts help pages and the source code. The
> "Note" section of help(cbind) tells you that the method dispatch is
> different. It even tells you what R source file to look at to see how
> dispatching is done. Compare the relevant source files from
> base::cbind and xts::cbind.xts, look at the "R Language Definition"
> manual to see how method dispatch is normally done.
>
> But you've been writing quite sophisticated code for a fairly long
> time, so I'm not telling you anything you don't know... you just don't
> think you should have to do the legwork.
>
> > -- Russ
> >
> >
>
> --
> Joshua Ulrich | FOSS Trading: www.fosstrading.com
>
>
>
> > On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan <jeff.a.ryan_at_gmail.com> wrote:
> >
> > > Hi Russ,
> > >
> > > Colnames don't get rewritten if they already exist. The reason is due
> to
> > > performance and how cbind is written at the R level.
> > >
> > > It isn't perfect per se, but the complexity and variety of dispatch
> that
> > > can take place for cbind in R, as it isn't a generic, is quite
> challenging
> > > to get to behave as one may hope. After years of trying I'd say it is
> > > nearly impossible to do what you want without causing horrible memory
> issues
> > > on non trivial objects they are use in production systems **using** xts
> on
> > > objects with billions of rows. Your simple case that has a simple
> > > workaround would cost everyone using in the other 99.999% of cases to
> pay a
> > > recurring cost that isn't tolerable.
> > >
> > > If this is frustrating to you you should stop using the class.
> > >
> > > Jeff
> > >
> > > Jeffrey Ryan | Founder | <jeffrey.ryan_at_lemnica.com>
> > > jeffrey.ryan_at_lemnica.com
> > >
> > > www.lemnica.com
> > >
> > > On May 8, 2011, at 2:07 PM, Russ Abbott <russ.abbott_at_gmail.com> wrote:
> > >
> > > I'm having troubles with the names of columns.
> > >
> > > quantmod deal with stock quotes. I've created an array of the first 5
> > > closing prices from Jan 2007. (Is there a problem that the name is the
> same
> > > as the variable name? There shouldn't be.)
> > >
> > > > close
> > >
> > > close
> > >
> > > 2007-01-03 1416.60
> > >
> > > 2007-01-04 1418.34
> > >
> > > 2007-01-05 1409.71
> > >
> > > 2007-01-08 1412.84
> > >
> > > 2007-01-09 1412.11
> > >
> > >
> > > When I try to create a more complex array by adding columns, the names
> get
> > > fouled up. Here's a simple example.
> > >
> > > > cbind(changed.close = close+1, zero = 0, close)
> > >
> > > close zero close.1
> > >
> > > 2007-01-03 1417.60 0 1416.60
> > >
> > > 2007-01-04 1419.34 0 1418.34
> > >
> > > 2007-01-05 1410.71 0 1409.71
> > >
> > > 2007-01-08 1413.84 0 1412.84
> > >
> > > 2007-01-09 1413.11 0 1412.11
> > >
> > >
> > > The first column should be called "changed.close", but it's called
> "close".
> > > The second column has the right name. The third column should be called
> > > "close" but it's called "close.1". Why is that? Am I missing something?
> > >
> > > If I change the order of the columns and let close have its original
> name,
> > > there is still a problem.
> > >
> > > > cbind(close, zero = 0, changed.close = close+1)
> > >
> > > close zero close.1
> > >
> > > 2007-01-03 1416.60 0 1417.60
> > >
> > > 2007-01-04 1418.34 0 1419.34
> > >
> > > 2007-01-05 1409.71 0 1410.71
> > >
> > > 2007-01-08 1412.84 0 1413.84
> > >
> > > 2007-01-09 1412.11 0 1413.11
> > >
> > >
> > > Now the names on the first two columns are ok, but the third column is
> > > still wrong. Again, why is that? Apparently it's not letting me assign
> a
> > > name to a column that comes from something that already has a name. Is
> that
> > > the way it should be?
> > >
> > > I don't get that same problem on a simpler example.
> > >
> > >
> > > > IX <- cbind(I=0, X=(1:3))
> > >
> > > > IX
> > >
> > > I X
> > >
> > > [1,] 0 1
> > >
> > > [2,] 0 2
> > >
> > > [3,] 0 3
> > >
> > > > cbind(Y = 1, Z = IX[, "I"], W = IX[, "X"])
> > >
> > > Y Z W
> > >
> > > [1,] 1 0 1
> > >
> > > [2,] 1 0 2
> > >
> > > [3,] 1 0 3
> > >
> > >
> > > Is this a peculiarity to xts objects?
> > >
> > > Thanks.
> > >
> > > *-- Russ *
> > > *
> > > *
> > > P.S. Once again I feel frustrated because it's taken me far more time
> than
> > > it deserves to track down and characterize this problem. I can fix it
> by
> > > using the names function. But I shouldn't have to do that.
> > >
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jeffrey Ryan
jeffrey.ryan_at_lemnica.com

www.lemnica.com

	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Mon 09 May 2011 - 03:30:26 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 09 May 2011 - 06:40:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive