From: Joshua Wiley <jwiley.psych_at_gmail.com>

Date: Fri, 04 Jun 2010 00:01:03 -0700

Date: Fri, 04 Jun 2010 00:01:03 -0700

I am not exactly sure how your filtering code is working, but take a look at

?na.omit

You will probably need a few additional steps if you want to remove all rows related to a particular id. Also look at ?subset which is a good general way to subset your data.

Josh

On Thu, Jun 3, 2010 at 11:45 PM, Jeffery Ding <jefferyding_at_gmail.com> wrote:

*> Thanks, you have been tremendously helpful!
**> I will be able to implement option 2, after I filter out stocks with
**> incomplete data sets.
**>
**> So far, for my filtering code I have:
**>
**> ##Filtering
**>
**> x<-length(unique(Returns$date_))
**> y<-unique(Returns$id)
**> Returns.filter<-Returns
**>
**> i<-1
**>
**> while(i<=length(y)) {
**> a<-sum(Returns$id==y[i])
**> if(a<x) {
**> ##need code that will remove all rows with id a
**> }
**> i<-i+1
**> }
**>
**>
**>
**> On Fri, Jun 4, 2010 at 2:40 PM, Joshua Wiley <jwiley.psych_at_gmail.com> wrote:
**>>
**>> Hey Jeff,
**>>
**>> I have a few ideas. Each has some different requirements, and to help
*

>> you choose, I bench marked them.

*>>
**>>
**>> ###START###
**>>
**>> ##Basic data
**>> > test <- data.frame(totret=rnorm(10^7), id=rep(1:10^4, each=10^3),
**>> > time=rep(c(1, rep(0, 999)), 10^4))
**>>
**>> ##Option 1: probably the most general, but also the slowest by far.
**>> ##The idea is it does the calculation for each stock/ID, and then
**>> concatenates [c()] an NA in front.
**>>
**>> > system.time(test[,"dailyreturns"] <- unlist(by(test[,"totret"],
**>> > test[,"id"], function(x) {c(NA, x[-1]/x[-length(x)])})), gcFirst=TRUE)
**>> user system elapsed
**>> 49.11 0.42 49.86
**>>
**>> ##Option 2: Assumes that you have the same number of measurements for
**>> each stock/ID so you can just assign an NA every nth row.
**>> ##This is fairly fast
**>>
**>> > system.time(test[-1,"dailyreturns"] <-
**>> > test[-1,"totret"]/test[-nrow(test),"totret"], gcFirst=TRUE)
**>> user system elapsed
**>> 1.11 0.21 1.31
**>> > system.time(test[seq(1, 10^7, by=10^3),"dailyreturns"] <- NA,
**>> > gcFirst=TRUE)
**>> user system elapsed
**>> 0.39 0.04 0.42
**>>
**>> ##Option 3: Assumes that you have some variable (time in my little
**>> test data) that somehow indicates when each stock/ID has its first
**>> measurement. In the example, the first measurement gets a 1 and
**>> subsequent ones a 0. So we just assign NA in 'dailyreturns' everytime
**>> the other "time" column has a 1. Again, a big assumption, but fairly
**>> quick.
**>>
**>> > system.time(test[-1,"dailyreturns"] <-
**>> > test[-1,"totret"]/test[-nrow(test),"totret"], gcFirst=TRUE)
**>> user system elapsed
**>> 1.06 0.17 1.25
**>> > system.time(test[which(test[,"time"]==1),"dailyreturns"] <- NA,
**>> > gcFirst=TRUE)
**>> user system elapsed
**>> 0.46 0.09 0.55
**>>
**>> ###END###
**>>
**>> I really feel like there should be a faster way that is also more
**>> general, but it is late and I am not coming up with any better ideas
**>> at the moment. Perhaps somehow finding the first instance of a
**>> stock/ID? Anyway, this was simulated on 10 million rows, so maybe
**>> by() works plenty fast for you.
**>>
**>> Josh
**>>
**>>
**>> On Thu, Jun 3, 2010 at 10:20 PM, Jeff08 <jefferyding_at_gmail.com> wrote:
**>> >
**>> > Hey Josh,
**>> >
**>> > Thanks for the quick response!
**>> >
**>> > I guess I have to switch from the Java mindset to the matrix/vector
**>> > mindset
**>> > of R.
**>> >
**>> > Your code worked very well, but I just have one problem:
**>> >
**>> > Essentially I have a time series of stock A, followed by a time series
**>> > of
**>> > stock B, etc.
**>> > So there are break points in the data (the points where it switches
**>> > stocks
**>> > have incorrect returns, and should be NA at t=0 for each stock)
**>> >
**>> > Is there an easy way to account for this in R? What I was thinking of is
**>> > if
**>> > there is a way to make a filter rule. Such as if the ID of the row
**>> > matches
**>> > Stock A, then perform this.
**>> >
**>> >>>"Hello Jeff,
**>> >
**>> > Try this:
**>> >
**>> > test <- data.frame(totret=rnorm(10^7)) #create some sample data
**>> > test[-1,"dailyreturn"] <- test[-1,"totret"]/test[-nrow(test),"totret"]
**>> >
**>> > The general idea is to take the column "totret" excluding the first 1,
**>> > dividided by "totret" exluding the last row. This gives in effect t+1
**>> > (since t is now shorter)/t
**>> >
**>> > I assigned the result to a new column "dailyreturn". For 10^7 rows,
**>> > it tooks 1.92 seconds on my system."
**>> > --
**>> > View this message in context:
**>> > http://r.789695.n4.nabble.com/R-Newbie-please-help-tp2242633p2242703.html
**>> > Sent from the R help mailing list archive at Nabble.com.
**>> >
**>> > ______________________________________________
**>> > R-help_at_r-project.org mailing list
**>> > https://stat.ethz.ch/mailman/listinfo/r-help
**>> > PLEASE do read the posting guide
**>> > http://www.R-project.org/posting-guide.html
**>> > and provide commented, minimal, self-contained, reproducible code.
**>> >
**>>
**>>
**>>
**>> --
**>> Joshua Wiley
**>> Senior in Psychology
**>> University of California, Riverside
**>> http://www.joshuawiley.com/
**>
**>
**>
**> --
**> Jeffery Ding
**> Duke University, Class of 2012
**> (224) 622-3398 | jd116_at_duke.edu
**>
*

-- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Fri 04 Jun 2010 - 07:03:43 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 04 Jun 2010 - 07:40:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*