Re: [R] Summary: Unexpected result of read.dbf

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sat 20 Aug 2005 - 02:08:36 EST

It really isn't clear that this is correct. The reason is correct: read.dbf treats numeric files with no decimals as integers, and that _is_ as stated on the help page. So it is definitely not a `bug', and reading the help would have shown the reason for the original question. [I in general do not reply to questions that can be answered from the help page.]

I believe this field has been incorrectly coded as numeric, as it seems to be a factor ('keycode'). In particular, 19 is not a valid field size for a numeric field.

If one wants to allow this, I think we have to use double for a field in which any value is not representable as an integer, and not just if the field size exceeds 9. I have been working on implementing that.

On Fri, 19 Aug 2005, Susumu Tanimura wrote:

> Hi there,
>
> This is summary and patch for a bug in read.dbf, demonstrating in
> Message-Id: <20050818150446.697835cb.stanimura-ngs@umin.ac.jp>.
>
> After consulting Rjpwiki, a cyber-community of R user in Japan, the
> cause was found, and the patch of solution was proposed.
>
> Overflowing occurs when we use read.dbf for reading a dbf file having
> a field of longer signed integer. For example,
>
> $ dbf2txt test.dbf
> #KEYCODE
> 422010010
> 42201002101
> 42201002102
> 42201002103
> 42201002104
> 422010060
> 422010071
> 422010072
> 42201008001
> 42201008002
>
> The KEYCODE field is numeric type, 19 digits, and no decimal. You can
> create this file with OpenOffice.org Calc, txt2dbf, and so on. You
> also prepare a file of CSV format.
>
>> library(foreign)
> > cbind(read.csv("test.csv"),read.dbf("test.dbf"))
> KEYCODE KEYCODE
> 1 422010010 422010010
> 2 42201002101 NA
> 3 42201002102 NA
> 4 42201002103 NA
> 5 42201002104 NA
> 6 422010060 422010060
> 7 422010071 422010071
> 8 422010072 422010072
> 9 42201008001 NA
> 10 42201008002 NA
>
> This is not reproducible when the field has decimals like numeric
> type, 19 digits, and 5 decimals.
>
> The patch written of Mr. Eiji Nakama is followed.
>
> --- foreign.orig/src/dbfopen.c 2005-08-19 18:54:06.000000000 +0900
> +++ foreign/src/dbfopen.c 2005-08-19 18:58:06.000000000 +0900
> @@ -970,7 +970,8 @@
> || psDBF->pachFieldType[iField] == 'F' )
> /* || psDBF->pachFieldType[iField] == 'D' ) D is Date */
> {
> - if( psDBF->panFieldDecimals[iField] > 0 )
> + if( psDBF->panFieldDecimals[iField] > 0 ||
> + psDBF->panFieldSize[iField] > 9 )
> return( FTDouble );
> else
> return( FTInteger );
>
> After adopting the patch, read.dbf works correctly.
>
>> cbind(read.csv("test.csv"),read.dbf("test.dbf"))
> KEYCODE KEYCODE
> 1 422010010 422010010
> 2 42201002101 42201002101
> 3 42201002102 42201002102
> 4 42201002103 42201002103
> 5 42201002104 42201002104
> 6 422010060 422010060
> 7 422010071 422010071
> 8 422010072 422010072
> 9 42201008001 42201008001
> 10 42201008002 42201008002
>
> --
> Susumu Tanimura
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Aug 20 02:15:53 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:34:35 EST