Re: R-alpha: read.table -- programming 'contest'

Martin Maechler (maechler@stat.math.ethz.ch)
Wed, 5 Mar 97 15:25:45 +0100


Date: Wed, 5 Mar 97 15:25:45 +0100
Message-Id: <9703051425.AA01252@>
From: Martin Maechler <maechler@stat.math.ethz.ch>
To: r-testers@stat.math.ethz.ch
In-Reply-To: <XFMail.970305095423.plummer@iarc.fr> (message from Martyn
Subject: Re: R-alpha: read.table -- programming 'contest' 

>>>>> "Martyn" == Martyn Plummer <plummer@iarc.fr> writes:

    Martyn> Looking at the code for read.table, I see that it reads the
    Martyn> whole dataset in as character data (using the scan() function),
    Martyn> before coercing it to numeric or factor data with the function
    Martyn> type.convert. Could this be the reason for the excessive memory
    Martyn> usage? Try using the scan() function instead.


This makes for our first "R programmer's contest" :  (;-)

Who writes the "best"   
	read.table  "drop-in replacement" in R (no C code)?

'best':= "Sum" of the following criteria:
	1) Must have the same or better functionality than 0.16.1
	2) CPU usage	  when reading medium / large datasets.
	3) Memory usage when reading   medium / large datasets.
	4) Elegance of code.

[and  "Who is the jury?"  ;-)]

----------
Actually, I think it does NOT make sense go for pure R code
(with no C-code or  system(..) calls).
As long as we are using Unix (or Windows NT/95  with  GNU Unix tools ??),
the most efficient solution (w/o C code) will probably use
	system( ... sed / awk / perl ...).

But then, read.table(.) will be much harder o port to Windows/ Mac....

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-testers mailing list -- For info or help, send "info" or "help",
To [un]subscribe, send "[un]subscribe"
(in the "body", not the subject !)  To: r-testers-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-