Re: [R] Tools For Preparing Data For Analysis

From: Ted Harding <ted.harding_at_nessie.mcc.ac.uk>
Date: Sun, 10 Jun 2007 09:28:30 +0100 (BST)


On 10-Jun-07 02:16:46, Gabor Grothendieck wrote:

> That can be elegantly handled in R through R's object
> oriented programming by defining a class for the fancy input.
> See this post:
>   https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html
> for a simple example of that style.
> 
> On 6/9/07, Robert Wilkins <irishhacker_at_gmail.com> wrote:

>> Here are some examples of the type of data crunching you might
>> have to do.
>>
>> In response to the requests by Christophe Pallier and Martin Stevens.
>>
>> Before I started developing Vilno, some six years ago, I had
>> been working in the pharmaceuticals for eight years ( it's not
>> easy to show you actual data though, because it's all confidential
>> of course).

I hadn't heard of Vilno before (except as a variant of "Vilnius"). And it seems remarkably hard to find info about it from a Google search. The best I've come up with, searching on

  vilno data

is at
  http://www.xanga.com/datahelper

This is a blog site, apparently with postings by Robert Wilkins.

At the end of the Sunday, September 17, 2006 posting "Tedious coding at the Pharmas" is a link:

  "I have created a new data crunching programming language."    http://www.my.opera.com/datahelper

which appears to be totally empty. In another blog article:

  "go to the www.my.opera.com/datahelper site, go to the August 31    blog article, and there you will find a tarball-file to download,    called vilnoAUG2006package.tgz"

so again inaccessible; and a google on "vilnoAUG2006package.tgz" gives a single hit which is simply the same aricle.

In the Xanga blog there are a few examples of tasks which are no big deal in any programming language (and, relative to their simplicity, appear a bit cumbersome in "Vilno").

I've not seen in the blog any instance of data transformation which could not be quite easily done in any straigthforward language (even awk).

>> Lab data can be especially messy, especially if one clinical
>> trial allows the physicians to use different labs. So let's
>> consider lab data.
>> [...]

That's a fairly daunting description, though indeed not at all extreme for the sort of data that can arise in practice (and not just in pharmaceutical investigations). But the complexity is in the situation, and, whatever language you use, the writing of the program will involve the writer getting to grips with the complexity, and the complexity will be present in the code simply because of the need to accomodate all the special cases, exceptions and faults that have to be anticipated in "feral" data.

Once these have been anticipated and incorporated in the code, the actual transformations are again no big deal.

Frankly, I haven't yet seen anything "Vilno" that couldn't be accomodated in an 'awk' program. Not that I'm advocating awk for universal use (I'm not that monolithic about it). But I'm using it as my favourite example of a flexible, capable, transparent and efficient data filtering language, as far as it goes.

SO: where can one find out more about Vilno, to see what it may really be capable of that can not be done so easily in other ways?

(As is implicit in many comments in Robert's blog, and indeed also from many postings to this list over time and undoubtedly well known to many of us in practice, a lot of the problems with data files arise at the data gathering and entry stages, where people can behave as if stuffing unpaired socks and unattributed underwear randomly into a drawer, and then banging it shut).

Best wishes to all,
Ted.



E-Mail: (Ted Harding) <ted.harding_at_nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 10-Jun-07                                       Time: 09:28:10
------------------------------ XFMail ------------------------------

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 10 Jun 2007 - 08:35:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Jun 2007 - 07:31:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.