Re: [Rd] Why does my RPy2 program run faster on Windows?

From: Abhijit Bera <abhibera_at_gmail.com>
Date: Wed, 02 Jun 2010 14:00:52 +0530

Hi

This problem is fixed. I was running an older kernel 2.6.28. I upgraded to 2.6.32-5 Debian Sid/Squeeze and now the performance is similar.

Regards

Abhijit Bera

On Wed, May 19, 2010 at 6:49 PM, Carlos J. Gil Bellosta < cgb_at_datanalytics.com> wrote:

> Dear Abhijit,
>
> If you think that table.CAPM is the culprit, you could run the call to
> such function in R on both platforms using Rprof to check which part
> of the function is producing the bottleneck.
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
>
> 2010/5/19 Abhijit Bera <abhibera_at_gmail.com>:
> > Update: it appears that the time taken isn't so much on the Data
> conversion.
> > The maximum time taken is in CAPM calculation. :( Anyone know why the
> CAPM
> > calculation would be faster on Windows?
> >
> > On Wed, May 19, 2010 at 5:51 PM, Abhijit Bera <abhibera_at_gmail.com>
> wrote:
> >
> >> Hi
> >>
> >> This is my function. It serves an HTML page after the calculations. I'm
> >> connecting to a MSSQL DB using pyodbc.
> >>
> >> def CAPM(self,client):
> >>
> >> r=self.r
> >>
> >> cds="1590"
> >> bm="20559"
> >>
> >> d1 = []
> >> v1 = []
> >> v2 = []
> >>
> >>
> >> print"Parsing GET Params"
> >>
> >> params=client.g[1].split("&")
> >>
> >> for items in params:
> >> item=items.split("=")
> >>
> >> if(item[0]=="cds"):
> >> cds=unquote(item[1])
> >> elif(item[0]=="bm"):
> >> bm=unquote(item[1])
> >>
> >> print "cds: %s bm: %s" % (cds,bm)
> >>
> >> print "Fetching data"
> >>
> >> t3=datetime.now()
> >>
> >> for row in self.cursor.execute("select * from (select * from (
> >> select co_code,dlyprice_date,dlyprice_close from feed_dlyprice P where
> >> co_code in (%s,%s) ) DataTable PIVOT ( max(dlyprice_close) FOR co_code
> IN
> >> ([%s],[%s]) )PivotTable ) a order by dlyprice_date" %(cds,bm,cds,bm)):
> >> d1.append(str(row[0]))
> >> v1.append(row[1])
> >> v2.append(row[2])
> >>
> >> t4=datetime.now()
> >>
> >> t1=datetime.now()
> >>
> >> print "Calculating"
> >>
> >> d1.pop(0)
> >> d1vec = robjects.StrVector(d1)
> >> v1vec = robjects.FloatVector(v1)
> >> v2vec = robjects.FloatVector(v2)
> >>
> >> r1 = r('Return.calculate(%s)' %v1vec.r_repr())
> >> r2 = r('Return.calculate(%s)' %v2vec.r_repr())
> >>
> >> tl = robjects.rlc.TaggedList([r1,r2],tags=('Geo','Nifty'))
> >> df = robjects.DataFrame(tl)
> >>
> >> ts2 = r.timeSeries(df,d1vec)
> >> tsa = r.timeSeries(r1,d1vec)
> >> tsb = r.timeSeries(r2,d1vec)
> >>
> >> robjects.globalenv["ta"] = tsa
> >> robjects.globalenv["tb"] = tsb
> >> robjects.globalenv["t2"] = ts2
> >> a = r('table.CAPM(ta,tb)')
> >>
> >> t2=datetime.now()
> >>
> >>
> >> page="<html><title>CAPM</title><body>Result:<br>%s<br>Time taken
> by
> >> DB:%s<br>Time taken by R:%s<br>Total time elapsed:%s<br></body></html>"
> >> %(str(a),str(t4-t3),str(t2-t1),str(t2-t3))
> >> print "Serving page:"
> >> #print page
> >>
> >> self.serveResource(page,"text",client)
> >>
> >>
> >>
> >> On Linux
> >> Time taken by DB:0:00:00.024165
> >> Time taken by R:0:00:05.572084
> >> Total time elapsed:0:00:05.596288
> >>
> >> On Windows
> >> Time taken by DB:0:00:00.112000
> >> Time taken by R:0:00:02.355000
> >> Total time elapsed:0:00:02.467000
> >>
> >> Why is there such a huge difference in the time taken by R on the two
> >> platforms? Am I doing something wrong? It's my first Rpy2 code so I
> guess
> >> it's badly written.
> >>
> >> I'm loading the following libraries:
> >> 'PerformanceAnalytics','timeSeries','fPortfolio','fPortfolioBacktest'
> >>
> >> I'm using Rpy2 2.1.0 and R 2.11
> >>
> >> Regards
> >>
> >> Abhijit Bera
> >>
> >>
> >>
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

        [[alternative HTML version deleted]]



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 02 Jun 2010 - 08:33:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Jun 2010 - 17:50:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive