Re: [Rd] read.table segfaults

From: Göran Broström <goran.brostrom_at_gmail.com>
Date: Sat, 27 Aug 2011 11:27:47 +0200

On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <bbolker_at_gmail.com> wrote:
> Scott <ncbi2r <at> googlemail.com> writes:
>
>>
>> It does look like you've got a memory issue. perhaps using
>>   as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
>> to read.table
>>
>> if you don't specify these sorts of things, R can have to look through the
>> file and figure out which columns are characters/factors etc and so the
>> larger files cause more of a headache for R I'm guess. Hopefully someone
>> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
>> stringsAsFactors.
>>
>>    do you have other objects loaded in memory as well? this file by itself
>> might not be the problem - but it's a cumulative issue.
>>    have you checked the file structure in any other manner?
>>    how large (Mb/kb) is the file that you're trying to read?
>>    if you just read in parts of the file, is it okay?
>>       read.table(filename,header=FALSE,sep="\t",nrows=100)
>>       read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

>
>  There seem to be two issues here:
>
> 1. what can the original poster (OP) do to work around this problem?
> (e.g. get the data into a relational data base and import it from
> there; use something from the High Performance task view such as
> ff or data.table ...)

Interestingly, the text file was created by a selection from an SQL data base. I have access to 'db2' on an ubuntu machine, I run, at the bash prompt,

$ db2 < file2.sql

where file2.sql contains

connect to linnedb user goran using xxxxxxxxxxx export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09  select linneid, fodelsear, kon, ....... from u09021.fil2 connect reset

How do I get a direct connection between R and the data base 'linnedb'?

> 2. reporting a bug -- according to the R FAQ, any low-level
> (segmentation-fault-type) crash of R when one is not messing
> around with dynamically loaded code constitutes a bug. Unfortunately,
> debugging problems like this is a huge pain in the butt.
>
>  Goran, can you randomly or systematically generate an
> object of this size, write it to disk, read it back in, and
> generate the same error?  In other words, does something like
>
> set.seed(1001)
> d <- data.frame(label=rep(LETTERS[1:11],1e6),
>                values=matrix(rep(1.0,11*17*1e6),ncol=17)
> write.table(d,file="big.txt")
> read.table("big.txt")
>
> do the same thing?

No but I get new errors:

> ss <- read.table("big.txt")
Error in read.table("big.txt") : duplicate 'row.names' are not allowed

(there are no duplicates)

I tried to add an item to the first line and

> ss <- read.table("big.txt", header = TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :   line 10610008 did not have 19 elements

which is wrong; that line has 19 elements.

Göran

> Reducing it to this kind of reproducible example will make
> it possible for others to debug it without needing to gain
> access to your huge file ...
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Göran Broström

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 27 Aug 2011 - 09:31:23 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 27 Aug 2011 - 12:20:23 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive