Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

From: Seth Falcon <>
Date: Tue, 01 Mar 2011 11:56:53 -0800

On Tue, Mar 1, 2011 at 10:06 AM, chen jia <> wrote:
> Hi Seth,
> Thanks so much for identifying the problem and explaining everything.
> I think the first solution that you suggest--make sure the schema has
> well defined types--would work the best for me. But, I have one
> question about how to implement it, which is more about sqlite itself.
> First, I found out that the columns that don't have the expected data
> types in the table annual_data3 are created by aggregate functions in
> a separate table. These columns are later combined with other columns
> that do.
> I read the link that you provide,
> One paragraph says "When
> grouping values with the GROUP BY clause values with different storage
> classes are considered distinct, except for INTEGER and REAL values
> which are considered equal if they are numerically equal. No
> affinities are applied to any values as the result of a GROUP by
> clause."
> If I understand it correctly, the columns created by aggregate
> functions with a GROUP by clause do not have any expected data types.
> My solution is to use CREATE TABLE clause to declare the expected
> datatype and then insert the values of columns created by the
> aggregate functions with the GROUP by clause. However, this solution
> requires a CREATE TABLE cause every time the aggregate function and
> the GROUP by clause is used.
> My question is: Is this the best way to make sure that the columns as
> a result of a GROUP by clause have the expected data types? Thanks.

That might be a good question to post to the SQLite user's list :-)

I don't have an answer off the top of my head. My reading of the SQLite docs would lead me to expect that a GROUP BY clause would not change/remove type if the column being grouped contains all the same declared type affinity.

+ seth

> Best,
> Jia
> On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon <> wrote:
>> Hi Jia,
>> On Mon, Feb 28, 2011 at 6:57 PM, chen jia <> wrote:
>>> The .schema of table annual_data3 is
>>> sqlite> .schema annual_data3
>>> CREATE TABLE "annual_data3"(
>>>  DATE INT,
>>>  PRC REAL,
>>>  RET REAL,
>>>  ...
>>>  pret_var,
>>>  pRET_sd,
>>>  nmret,
>>>  pya_var,
>> [snip]
>> Is there a reason that you've told SQLite the expected data type for
>> only some of the columns?
>>> Interestingly, I find that the problem I reported does not for columns
>>> labeled real in the schema info. For example, the type of column RET
>>> never changes no matter what the first observation is.
>> Yes, that is expected and I think it is the solution to your problem:
>> setup your schema so that all columns have a declared type.  For some
>> details on SQLite's type system see
>> RSQLite currently maps NA values to NULL in the database.  Pulling
>> data out of a SELECT query, RSQLite uses the sqlite3_column_type
>> SQLite API to determine the data type and map it to an R type.  If
>> NULL is encountered, then the schema is inspected using
>> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
>> the data is mapped to a character vector at the R level.  The type
>> selection is done once after the first row has been fetched.
>> To work around this you can:
>> - make sure your schema has well defined
>>  types (which will help SQLite perform its operations);
>> - check whether the returned column has the expected type and convert
>>  if needed at the R level.
>> - remove NA/NULL values from the db or decide on a different way of
>>  encoding them (e.g you might be able to use -1 in the db in some
>>  situation to indicate missing).  Your R code would then need to map
>>  these to proper NA.
>> Hope that helps.
>> + seth
>> --
>> Seth Falcon | @sfalcon |
> --
> 700 Fisher Hall
> 2100 Neil Ave.
> Columbus, Ohio  43210

Seth Falcon | @sfalcon |

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 01 Mar 2011 - 19:59:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Mar 2011 - 20:00:18 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive