Re: [R] Can RMySQL be used for a paramterized query?

From: Ted Byers <r.ted.byers_at_gmail.com>
Date: Thu, 10 Jun 2010 18:11:38 -0400

Thanks,

Actually, thanks to the info Henrique sent, I have made decent progress.

In actuality, I could have just submitted a SELECT * on the second table, which would give me everything, just like Henrique's suggestion, and yours, would give me. The problem is that that table is HUGE (I don't want to load ALL that data at once, especially when I'd be analyzing it in chunks defined by ID and date), and at the same time, the analyses would be done not only one ID at a time, but on records pertaining to a given day. (e.g., imagine a dataset containing al sales and refund data, and assuming rates at which sales end in refunds vary through time, something I know from previous analyses of similar data, I would need to analyze all refunds for sales that happened on a given day).

While I was aware I could use RMySQL to get my time series data (I will be assessing a VAR on a 3D time series once my current task is done), I looked at TSMySQL because, being relatively inexperienced with R, I need to be able to do a variety of autoregressive analyses. Someone has suggested I also look at state space modelling, but being a mathematical ecologst by training, I am struggling with that along with Kalman Filtering. But that is another post ...

Thanks again

Ted

On Thu, Jun 10, 2010 at 5:51 PM, Paul Gilbert < pgilbert_at_bank-banque-canada.ca> wrote:

> Ted
>
> I'm not sure I fully understand the question, but you may want to consider
> creating a temporary table with a join, which you can do with a query from
> your R session, and then query that table to bring the data into R. Roughly,
> the logic is to leave the data in the db if you are not doing any fancy
> calculations. You might also find "order by" is useful. (This is what I use
> in TSdbi to make sure data comes back in the right order as a time series.)
> It may even be possible to get everything you want back in one step using
> this and "group by", rather than looping, but depending on the analysis you
> want to do in R, that may not be the most convenient way.
>
> BTW, I think you realize you do not have to use the TSMySQL commands to
> access the TSMySQL database. They are usually convenient, but you can query
> the tables directly with RMySQL functions.
>
> Paul
>
> >-----Original Message-----
> >From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> >On Behalf Of Henrique Dallazuanna
> >Sent: June 10, 2010 8:47 AM
> >To: Ted Byers
> >Cc: R-help Forum
> >Subject: Re: [R] Can RMySQL be used for a paramterized query?
> >
> >I think you can do this:
> >
> >ids <- dbGetQuery(conn, "SELECT id FROM my_table") other_table <-
> >dbGetQuery(conn, sprintf("SELECT * FROM my_other_table WHERE t1_id in
> >(%s)", paste(ids, collapse = ",")))
> >
> >On Wed, Jun 9, 2010 at 11:24 PM, Ted Byers <r.ted.byers_at_gmail.com>
> >wrote:
> >
> >> I have not found anything about this except the following from the DBI
> >> documentation :
> >>
> >> Bind variables: the interface is heavily biased towards queries, as
> >> opposed
> >> > to general
> >> > purpose database development. In particular we made no attempt to
> >> > define "bind variables"; this is a mechanism by which the contents
> >> > of R/S objects are implicitly moved to the database during SQL
> >> > execution. For instance, the following embedded SQL statement
> >> > /* SQL */
> >> > SELECT * from emp_table where emp_id = :sampleEmployee would take
> >> > the vector sampleEmployee and iterate over each of its
> >> elements
> >> > to get the result. Perhaps the DBI could at some point in the future
> >> > implement this feature.
> >> >
> >>
> >> I can connect, and execute a SQL query such as "SELECT id FROM
> >> my_table", and display a frame with all the IDs from my_table. But I
> >> need also to do something like "SELECT * FROM my_other_table WHERE
> >> t1_id = x" where 'x' is one of the IDs returned by the first select
> >> statement. Actually, I have to do this in two contexts, one where the
> >> data are not ordered by time and one where it is (and thus where I'd
> >> have to use TSMySQL to execute something like "SELECT
> >record_datetime,value FROM my_ts_table WHERE t2_id = x").
> >>
> >> I'd like to embed this in a loop where I iterate over the IDs returned
> >> by the first select, get the appropriate data from the second for each
> >> ID, analyze that data and store results in another table in the DB,
> >> and then proceed to the next ID in the list. I suppose an alternative
> >> would be to get all the data at once, but the resulting resultset
> >> would be huge, and I don't (yet) know how to take a subset of the data
> >> in a frame based on a given value in one ot the fields and analyze
> >> that. Can you point me to an example of how this is done, or do I
> >> have to use a mix of perl (to get the
> >> data) and R (to do the analysis)?
> >>
> >> Any insights on how to proceed would be appreciated. Thanks.
> >>
> >> Ted
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >--
> >Henrique Dallazuanna
> >Curitiba-Paraná-Brasil
> >25° 25' 40" S 49° 16' 22" O
> >
> > [[alternative HTML version deleted]]
>
>
> ====================================================================================
>
> La version française suit le texte anglais.
>
>
> ------------------------------------------------------------------------------------
>
> This email may contain privileged and/or confidential information, and the
> Bank of
> Canada does not waive any related rights. Any distribution, use, or copying
> of this
> email or the information it contains by other than the intended recipient
> is
> unauthorized. If you received this email in error please delete it
> immediately from
> your system and notify the sender promptly by email that you have done so.
>
>
> ------------------------------------------------------------------------------------
>
> Le présent courriel peut contenir de l'information privilégiée ou
> confidentielle.
> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute
> diffusion,
> utilisation ou copie de ce courriel ou des renseignements qu'il contient
> par une
> personne autre que le ou les destinataires désignés est interdite. Si vous
> recevez
> ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans
> délai à
> l'expéditeur un message électronique pour l'aviser que vous avez éliminé de
> votre
> ordinateur toute copie du courriel reçu.
>
>

-- 
R.E.(Ted) Byers, Ph.D.,Ed.D.
TED_at_MERCHANTSERVICECORP.COM
CTO
Merchant Services Corp.
350 Harry Walker Parkway North, Suite 8
Newmarket, Ontario
L3Y 8L3

	[[alternative HTML version deleted]]


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Thu 10 Jun 2010 - 22:13:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 10 Jun 2010 - 22:20:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive