OK, I can't resist that invitation. I think there are many kinds of problematic data. I handle some nasty textish things in perl (and I loved the purgatory quote) and I'm afraid I do some things in Excel and some cleaning I can handle in R, but I never enter data directly into R.

However, one very common scenario I have faceda all my working life is psych data from questionnaires or interviews in low budget work, mostly student research or routine entry of therapists' data. Typically you have an identifier, a date, some demographics and then a lot of item data. There's little money (usual zero) involved for data entry and cleaning but I've produced a lot of good(ish) papers out of this sort of very low budget work over the last 20 years. (Right at the other end of a financial spectrum from the FDA/validated s'ware thread but this is about validation again!)

The problem I often face is that people are lousy data entry machines (well, actually, they vary ... enormously) and if they mess up the data entry we all know how horrible this can be.

SPSS (boo hiss) used to have an excellent "module", actually a standalone PC/Windoze program, that allowed you to define variables so they had allowed values and it would refuse to accept out of range or out of acceptable entries, it also allowed you to create checking rules and rules that would, in the light of earlier entries, set later values and not ask about them. In a rudimentary way you could also lay things out on the screen so that it paginated where the q'aire or paper data record did etc. The final nice touch was that you could define some variables as invariant and then set the thing so an independent data entry person could re-enter the other data (i.e. pick up q'aire, see if ID fits the one showing on screen, if so, enter the rest of the data). It would bleep and not move on if you entered a value other than that entered by the first person and you had to confirm that one of you was right.

That saved me wasted weeks I'm sure on analysing data that turned out to be awful and I'd love to see someone build something to replace that.

Currently I tend to use (boo hiss) Excel for this as everyone I work with seems to have it (and not all can install open office and anyway I haven't had time to learn that properly yet either ...) and I set up spreadsheets with validation rules set. That doesn't get the branching rules and checks (e.g. if male, skip questions about periods, PMT and pregnancies), or at least, with my poor Excel skills it doesn't. I just skip a column to indicate page breaks in the q'aire, and I get, when I can, two people to enter the data separately and then use R to compare the two spreadsheets having yanked them into data frames.

I would really, really love someone to develop (and perhaps replace) the rather buggy edit() and fix() routines (seem to hang on big data frames in Rcmdr which is what I'm trying to get students onto) with something that did some or all of what SPSS/DE used to do for me or I bodge now in Excel. If any generous coding whiz were willing to do this, I'll try to alpha and beta test and write help etc.

There _may_ be good open source things out there that do what I need but something that really integrated into R would be another huge step forward in being able to phase out SPSS in my work settings and phase in R.

