Re: [Rd] read.table segfaults

From: Ben Bolker <bbolker_at_gmail.com>
Date: Fri, 26 Aug 2011 21:55:51 +0000

Scott <ncbi2r <at> googlemail.com> writes:

>
> It does look like you've got a memory issue. perhaps using
> as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
> to read.table
>
> if you don't specify these sorts of things, R can have to look through the
> file and figure out which columns are characters/factors etc and so the
> larger files cause more of a headache for R I'm guess. Hopefully someone
> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
> stringsAsFactors.
>
> do you have other objects loaded in memory as well? this file by itself
> might not be the problem - but it's a cumulative issue.
> have you checked the file structure in any other manner?
> how large (Mb/kb) is the file that you're trying to read?
> if you just read in parts of the file, is it okay?
> read.table(filename,header=FALSE,sep="\t",nrows=100)
> read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

  There seem to be two issues here:

  1. what can the original poster (OP) do to work around this problem? (e.g. get the data into a relational data base and import it from there; use something from the High Performance task view such as ff or data.table ...)
  2. reporting a bug -- according to the R FAQ, any low-level (segmentation-fault-type) crash of R when one is not messing around with dynamically loaded code constitutes a bug. Unfortunately, debugging problems like this is a huge pain in the butt.

  Goran, can you randomly or systematically generate an object of this size, write it to disk, read it back in, and generate the same error? In other words, does something like

set.seed(1001)
d <- data.frame(label=rep(LETTERS[1:11],1e6),

                values=matrix(rep(1.0,11*17*1e6),ncol=17)
write.table(d,file="big.txt")
read.table("big.txt")

do the same thing?

Reducing it to this kind of reproducible example will make it possible for others to debug it without needing to gain access to your huge file ...



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 26 Aug 2011 - 22:33:42 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 27 Aug 2011 - 10:00:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive