Re: [R] Fast update of a lot of records in a database?

From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
Date: Sat 20 May 2006 - 11:17:57 EST

On 5/19/2006 3:19 PM, hadley wickham wrote:

>> put the updates into a temporary table called updates
>>
>> UPDATE bigtable AS a
>>    FROM updates AS b
>>    WHERE a.id = b.id
>>    SET a.col1 = b.col1

>
> I don't think this will be any faster - why would creating a new table
> be faster than updating existing rows?

I don't know, but I assumed each SQL statement resulted in some sort of turnaround delay with the database. Creating the new table requires just a few statements, one of which is a huge INSERT statement).

I haven't had a chance yet to do the timing, but creating a test update table with 400000 new values hadn't finished in an hour.

> I've never had a problem with using large numbers of SQL update
> statements (in the order of hundreds of thousands) to update a table
> and having them complete in a reasonable time (a few minutes). How
> heavily indexed is the field you are updating?

The field being updated isn't indexed; the id field is.

You may be able to get
> some speed improvements by turning off indices before the update and
> back on again afterwards (highly dependent on your database system
> though).
>
> I would strongly suspect your bottleneck lies elsewhere (eg. when
> generating the statements in R, or using ODBC to send them)

It's not in R (that goes quickly), but I don't know how to distinguish ODBC slowdowns from database slowdowns. On my machine ODBC is the only option.

Part of the problem is probably that I am remote from the server, using ODBC over an SSH tunnel. But the original "thousands of UPDATES" was taking hours running on a machine much closer to the server. It was using RdbiPgsql, I'm using RODBC. Doing the SELECT takes me a couple of minutes, and is faster locally; why is UPDATE so slow?

Duncan



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat May 20 11:23:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 20 May 2006 - 12:10:13 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.