Re: [Rd] Bug in read.table?

From: Tony Plate <tplate_at_acm.org>
Date: Sun, 07 Nov 2010 11:17:37 -0500

The problem has to do with the quote characters in the data (R is probably interpreting the 'minutes' and 'seconds' as delimiter characters).

With a smaller data file, I can reproduce the strange behavior.

read.table() can read the data correctly if given quote="" to disable the interpretation of quote chars.

Contents of tmp2.txt:

  37.8275120694  -1.2077972583 00112'28.07013"W 03749'39.04345"N
  37.8275121083  -1.2077974806 00112'28.07093"W 03749'39.04359"N
  37.8275118539  -1.2077974338 00112'28.07076"W 03749'39.04267"N
  37.8275119923  -1.2077974626 00112'28.07087"W 03749'39.04317"N


 > read.table(file.path("tmp2.txt"), header=FALSE, as.is=TRUE)
         V1        V2                V3                V4
1 37.82751 -1.207797 00112'28.07076"W 03749'39.04267"N
2 37.82751 -1.207797 00112'28.07087"W 03749'39.04317"N
3 37.82751 -1.207797 00112'28.07013"W 03749'39.04345"N
4 37.82751 -1.207797 00112'28.07093"W 03749'39.04359"N
5 37.82751 -1.207797 00112'28.07076"W 03749'39.04267"N 6 37.82751 -1.207797 00112'28.07087"W 03749'39.04317"N Warning message:
In read.table(file.path("tmp2.txt"), header = FALSE, as.is = TRUE) :

   incomplete final line found by readTableHeader on 'tmp2.txt'  > read.table(file.path("tmp2.txt"), header=FALSE, as.is=TRUE, quote="")

         V1        V2                V3                V4
1 37.82751 -1.207797 00112'28.07013"W 03749'39.04345"N
2 37.82751 -1.207797 00112'28.07093"W 03749'39.04359"N
3 37.82751 -1.207797 00112'28.07076"W 03749'39.04267"N 4 37.82751 -1.207797 00112'28.07087"W 03749'39.04317"N  >

The docs for read.table() direct the reader to the docs for scan() regarding the behavior with embedded quote chars. The behavior of read.table() on this data with the default quote chars is puzzling though.

On 11/5/2010 5:22 PM, jgarcia_at_ija.csic.es wrote:
> Hi,
>
> I'm writting to this list as I'm puzzled about the behaviour of
> read.table(). It is hard to believe that there is a bug in this utils'
> function, but for my:
>
> R version 2.12.0 alpha (2010-09-28 r53056)
>
> I'm using scan and read.table to read a number of files, which are as:
>
> ---
>
> Project: Murta Sonda
> Program: GrafNav Version 8.30.1007
> Profile: javier
> Source: GPS Epochs(Combined)
> ProcessInfo: Run (1) by Unknown on 11/04/2010 at 19:05:17
>
> Datum: WGS84, (processing datum)
> Master 1: Name LaMurta, Status ENABLED
> Antenna height 2.066 m, to L1-PC (NOV702GG, MeasDist 1.980 m
> to mark/ARP)
> Position 37 49 38.15069, -1 12 27.55445, 368.197 m (WGS84,
> Ellipsoidal hgt)
> Remote: Antenna height 1.781 m, to L1-PC (NOV702GG, MeasDist 1.695 m
> to mark/ARP)
> UTC Offset: 15 s
> Local time: +2.0 h, CEST [Central European Savings Time]
> Geoid: EGM2008-World.wpg (Absolute correction)
>
> Latitude Longitude LonTextLoTextLongitudTextL
> LatTextLaTextLatitudeTextL H-Ell H-MSL LocalUTCDa
> LocalUTC
> (Deg) (Deg) (DeMi (Sec) (DeMi (Sec) (m)
> (m) (DMY) (HMS)
> 37.8275120694 -1.2077972583 00112'28.07013"W 03749'39.04345"N
> 368.998 318.059 25/10/2010 16:59:00
> 37.8275121083 -1.2077974806 00112'28.07093"W 03749'39.04359"N
> 368.994 318.055 25/10/2010 16:59:15
> 37.8275118539 -1.2077974338 00112'28.07076"W 03749'39.04267"N
> 368.997 318.058 25/10/2010 16:59:30
> 37.8275119923 -1.2077974626 00112'28.07087"W 03749'39.04317"N
> 368.998 318.060 25/10/2010 16:59:45
> 37.8275323099 -1.2078075891 00112'28.10732"W 03749'39.11632"N
> 368.869 317.930 25/10/2010 17:00:00
> 37.8275323374 -1.2078077002 00112'28.10772"W 03749'39.11641"N
> 368.866 317.927 25/10/2010 17:00:15
> 37.8275325076 -1.2078075314 00112'28.10711"W 03749'39.11703"N
> 368.859 317.920 25/10/2010 17:00:30
> 37.8275325306 -1.2078075056 00112'28.10702"W 03749'39.11711"N
> 368.861 317.922 25/10/2010 17:00:45
> 37.8275323639 -1.2078075917 00112'28.10733"W 03749'39.11651"N
> 368.853 317.914 25/10/2010 17:01:00
> 37.8275326222 -1.2078076861 00112'28.10767"W 03749'39.11744"N
> 368.857 317.918 25/10/2010 17:01:15
> ---
>
> with a number of different records for each file.
>
> To read the data I'm using:
>
> ---
> dat.names<- scan(file.path("path_and_filename"),
> what="character",
> skip = 16, nlines=1)
> if(length(dat.names) != 8){
> stop("Input file seems to be wrong!")}
>
> dat<- read.table(file.path("path_and_filename),
> header=FALSE, col.names=dat.names,
> skip = 18, as.is=TRUE, blank.lines.skip=FALSE)
> ---
> and systematically, I'm obtaining a number of repeated records at the
> starting of the input table (6 in this example). It is easily seen by
> looking at the field "LocalUTC":
>
>> dat
> Latitude Longitude LonTextLoTextLongitudTextL
> LatTextLaTextLatitudeTextL H.Ell H.MSL LocalUTCDa LocalUTC
> 1 37.82753 -1.207808 00112'28.10732"W
> 03749'39.11632"N 368.869 317.930 25/10/2010 17:00:00
> 2 37.82753 -1.207808 00112'28.10772"W
> 03749'39.11641"N 368.866 317.927 25/10/2010 17:00:15
> 3 37.82753 -1.207808 00112'28.10711"W
> 03749'39.11703"N 368.859 317.920 25/10/2010 17:00:30
> 4 37.82753 -1.207808 00112'28.10702"W
> 03749'39.11711"N 368.861 317.922 25/10/2010 17:00:45
> 5 37.82753 -1.207808 00112'28.10733"W
> 03749'39.11651"N 368.853 317.914 25/10/2010 17:01:00
> 6 37.82753 -1.207808 00112'28.10767"W
> 03749'39.11744"N 368.857 317.918 25/10/2010 17:01:15
> 7 37.82751 -1.207797 00112'28.07013"W
> 03749'39.04345"N 368.998 318.059 25/10/2010 16:59:00
> 8 37.82751 -1.207797 00112'28.07093"W
> 03749'39.04359"N 368.994 318.055 25/10/2010 16:59:15
> 9 37.82751 -1.207797 00112'28.07076"W
> 03749'39.04267"N 368.997 318.058 25/10/2010 16:59:30
> 10 37.82751 -1.207797 00112'28.07087"W
> 03749'39.04317"N 368.998 318.060 25/10/2010 16:59:45
> 11 37.82753 -1.207808 00112'28.10732"W
> 03749'39.11632"N 368.869 317.930 25/10/2010 17:00:00
> 12 37.82753 -1.207808 00112'28.10772"W
> 03749'39.11641"N 368.866 317.927 25/10/2010 17:00:15
> 13 37.82753 -1.207808 00112'28.10711"W
> 03749'39.11703"N 368.859 317.920 25/10/2010 17:00:30
> 14 37.82753 -1.207808 00112'28.10702"W
> 03749'39.11711"N 368.861 317.922 25/10/2010 17:00:45
> 15 37.82753 -1.207808 00112'28.10733"W
> 03749'39.11651"N 368.853 317.914 25/10/2010 17:01:00
> 16 37.82753 -1.207808 00112'28.10767"W
> 03749'39.11744"N 368.857 317.918 25/10/2010 17:01:15
>
> Thanks,
>
> Javier
> ---
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun 07 Nov 2010 - 16:21:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 07 Nov 2010 - 22:10:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive