Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

From: Seth Falcon <seth_at_userprimary.net>
Date: Tue, 01 Mar 2011 11:56:53 -0800

On Tue, Mar 1, 2011 at 10:06 AM, chen jia <chen_1002_at_fisher.osu.edu> wrote:
> Hi Seth,
>
> Thanks so much for identifying the problem and explaining everything.
> I think the first solution that you suggest--make sure the schema has
> well defined types--would work the best for me. But, I have one
> question about how to implement it, which is more about sqlite itself.
>
> First, I found out that the columns that don't have the expected data
> types in the table annual_data3 are created by aggregate functions in
> a separate table. These columns are later combined with other columns
> that do.
>
> I read the link that you provide,
> http://www.sqlite.org/datatype3.html. One paragraph says "When
> grouping values with the GROUP BY clause values with different storage
> classes are considered distinct, except for INTEGER and REAL values
> which are considered equal if they are numerically equal. No
> affinities are applied to any values as the result of a GROUP by
> clause."
>
> If I understand it correctly, the columns created by aggregate
> functions with a GROUP by clause do not have any expected data types.
>
> My solution is to use CREATE TABLE clause to declare the expected
> datatype and then insert the values of columns created by the
> aggregate functions with the GROUP by clause. However, this solution
> requires a CREATE TABLE cause every time the aggregate function and
> the GROUP by clause is used.
>
> My question is: Is this the best way to make sure that the columns as
> a result of a GROUP by clause have the expected data types? Thanks.

That might be a good question to post to the SQLite user's list :-)

I don't have an answer off the top of my head. My reading of the SQLite docs would lead me to expect that a GROUP BY clause would not change/remove type if the column being grouped contains all the same declared type affinity.

+ seth

>
> Best,
> Jia
>
> On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon <seth_at_userprimary.net> wrote:
>> Hi Jia,
>>
>> On Mon, Feb 28, 2011 at 6:57 PM, chen jia <chen_1002_at_fisher.osu.edu> wrote:
>>> The .schema of table annual_data3 is
>>> sqlite> .schema annual_data3
>>> CREATE TABLE "annual_data3"(
>>>  PERMNO INT,
>>>  DATE INT,
>>>  CUSIP TEXT,
>>>  EXCHCD INT,
>>>  SICCD INT,
>>>  SHROUT INT,
>>>  PRC REAL,
>>>  RET REAL,
>>>  ...
>>>  pret_var,
>>>  pRET_sd,
>>>  nmret,
>>>  pya_var,
>>
>> [snip]
>>
>> Is there a reason that you've told SQLite the expected data type for
>> only some of the columns?
>>
>>> Interestingly, I find that the problem I reported does not for columns
>>> labeled real in the schema info. For example, the type of column RET
>>> never changes no matter what the first observation is.
>>
>> Yes, that is expected and I think it is the solution to your problem:
>> setup your schema so that all columns have a declared type.  For some
>> details on SQLite's type system see
>> http://www.sqlite.org/datatype3.html.
>>
>> RSQLite currently maps NA values to NULL in the database.  Pulling
>> data out of a SELECT query, RSQLite uses the sqlite3_column_type
>> SQLite API to determine the data type and map it to an R type.  If
>> NULL is encountered, then the schema is inspected using
>> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
>> the data is mapped to a character vector at the R level.  The type
>> selection is done once after the first row has been fetched.
>>
>> To work around this you can:
>>
>> - make sure your schema has well defined
>>  types (which will help SQLite perform its operations);
>>
>> - check whether the returned column has the expected type and convert
>>  if needed at the R level.
>>
>> - remove NA/NULL values from the db or decide on a different way of
>>  encoding them (e.g you might be able to use -1 in the db in some
>>  situation to indicate missing).  Your R code would then need to map
>>  these to proper NA.
>>
>> Hope that helps.
>>
>> + seth
>>
>>
>>
>> --
>> Seth Falcon | @sfalcon | http://userprimary.net/
>>
>
>
>
> --
> 700 Fisher Hall
> 2100 Neil Ave.
> Columbus, Ohio  43210
> http://www.fisher.osu.edu/~chen_1002/
>

-- 
Seth Falcon | @sfalcon | http://userprimary.net/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 01 Mar 2011 - 19:59:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Mar 2011 - 20:00:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive