Re: [R] Tools For Preparing Data For Analysis

From: Christophe Pallier <>
Date: Fri, 08 Jun 2007 22:38:58 +0200

On 6/8/07, Douglas Bates <> wrote:
> Other responses in this thread have mentioned 'little language'
> filters like awk, which is fine for those who were raised in the Bell
> Labs tradition of programming ("why type three characters when two
> character names should suffice for anything one wants to do on a
> PDP-11") but the typical field scientist finds this a bit too terse to
> understand and would rather write a filter as a paragraph of code that
> they have a change of reading and understanding a week later.


Concerning awk, I think that this comment does not apply: because the language is simple and and somewhat limited, awk scripts are typically quite clean and readable (of course, it is possible to write horrible code in any languages).

I have introduced awk to dozens of people (mostly scientists in social sciences, and dos/windows users...) over the last 15 years it is sometimes the only programming language they know and they are very happy with what they can do with it.

The philosophy of using it as a filter (that is, a converter) is also good because many problems are best solved in 2 or 3 steps (2/3 short scripts run sequentially) rather than in one single step,as people tend to do with languages that encourage to use more complex data structures than associative arrays.

It could be argued that awk is the swiss army knife of simple text manipulations. All in all, awk+R is very efficient combination for data manipulation (at least for the cases I have encountered).

It would a pity if your remark led people to overlook awk as it would efficiently solve many of the input parsing problems that are posted on this list (I am talking here about extracting information from text files, not data entry).

awk, like R, is not exempt of defects, yet both are tools that one gets attached to because they increase your productivity a lot.

Christophe Pallier (

	[[alternative HTML version deleted]]

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 08 Jun 2007 - 20:47:51 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Jun 2007 - 21:32:13 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.