[R] R and databases - a comment

From: charles loboz <charles_loboz_at_yahoo.com>
Date: Sun 04 Dec 2005 - 15:06:41 EST

  1. That was a part of a private email exchange. It has been suggested that more people may be interested.
  2. I did use various databases (significant part of my job) for the last 15 years. Some with R for the last 3 years as a hobby. Some comments on the ones used below. Sorry, no links - I am time-constrained at the moment - please google if interested in details. The remarks are from the point of view of R user, not that of 'general database user'.
  3. SQLITE. www.sqlite.org - probably the best datase to use with R. No setup, no administration, embedded - so less connection overhead. All data in one file - so easy to transfer. Solid. Very functional SQL, fast if you play it right (almost as fast as SQLServer on Windows...) . Some limitations - no stored procedures. Some preprocessing/parsing can be done using TCL - well integrated with sqlite if you need that. Due to the implementation quirk you can even compute recursive functions (like exponential moving average or Fibonacci numbers) with SQL :-). Easy import/export of data to text files. After trying few other dbs I settled down on this one. Even considered writing a tutorial on SQLite use with R (like how to process gigabytes of data on a 128mb computer :-) ) - but time constraints stopped me. [Personally I think that SQLite should come bundled with the standard R installation. Could even be used to keep a lot of R's internal stuff, would probably simplify overall coding. But that is for others to decide]

All other databases (including mysql) require typical setup - installation, administration, user rights, keeping track of ports, services/daemons, directories, backups etc - so some db administrative skills are required.I am not sure how many R users are willing to go through that. The ones who may be interested in the stuff below  

4. www.postgres.org Postgres. Free. As complete as one can wish, small download, great functionality. Interfaces well to other languages, so you can do numerics in C++ and store that in the database (though why not do numerics in R?). Current version 8.1, much improved.  

5. Firebird. open source verion of Interbase. Easy setup and can have all data in one file. But... slow development - not many developers there. SQL full but somewhat quirky (when porting from other dialects).  

6. Mysql. the inheritance from the original ISAM system still shows. Nice user interface, but... if you need real db why not use postgres? if you need something simpler, without administration, why not use SQLITE? No doubt mysql is fine for many simple websites etc - this is mysql's niche.  

7. derby and hsqldb. both are written in Java, open source. HSQLDB (used now by OpenOffice) allows creation of in-memory tables and it's fast there - but it's usage from inside R is tricky - there is no easily available, installable and current ODBC driver. Similar for derby - the ODBC driver is there, but installation can be tricky to non-professionals. May be in the future...  

There are three 'express' versions of commercial databases. They all share some restrictions, like max disc data size 2-4gb, max mem size 1-2gb and usage of single processor only. Plus various licensing restrictions, so be careful how you use them.  

Each commercial DB has some OLAP capability, but I am not sure how much of it is/will be available in the Express version.                 

Just $16.99/mo. or less.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Dec 04 15:13:16 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:28 EST