Re: [R] RODBC sqlQuery insert slow

From: Michel Lang <michellang_at_gmail.com>
Date: Fri 13 Oct 2006 - 15:19:09 GMT

> I am trying to insert a lot of data into a table using windows R (2.3.1)
> and a mysql database via RODBC.
> First I read a file with read.csv and then form sql insert statements
> for each row and execute the insert query one row at a time. See the
> loop below.
> This turns out to be very slow.
> Can anyone please suggest a way to speed it up?

A few weeks ago I had to solve a similar task. Inserting each row turned out to be horrible slow due to paste() and the data.frame-indexing. The estimated runtime would have been over 3 weeks, so I used MySQLs LOAD DATE INFILE syntax to speed things up. You must have FILE_PRIV = 'Y' set in the mysql.user-table to use this small hack, and I'm not that sure that this runs remotely. It is also assumed that your df has valid column-names.

tmp_filename <- tempfile()

write.table(df, tmp_filename, na = "\\N", row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t")

query <- paste(

    "LOAD DATA LOCAL INFILE '", tmp_filename, "'",     " INTO TABLE ", your_table, " (", toString(names(df)), ");", sep = "")

sqlQuery(channel, query)
unlink(tmp_filename)

The total runtime for the LOAD DATA INFILE querys was something below 5 minutes, inserting 3e+6 rows with > 200 columns.

Michel Lang



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Oct 14 02:26:20 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 13 Oct 2006 - 17:30:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.