[R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency

From: Greg Snow <Greg.Snow_at_imail.org>
Date: Fri, 06 Jun 2008 13:14:20 -0600

I still like the number 4 option, so I think we need to come up with a formal definition for a "junk" of data. I read somewhere that Tukey coined the word "bit" as it applies to computers, we can share the credit/blame for "junks" of data.

My proposal for a statistical/data definition of the work junk:

Junk (noun):
A quantity of data just large enough to get the client excited about the "great" dataset they provided, but not large enough to make any useful conclusions.

Example sentence: We just received another junk of data from the boss, who gets to give him the bad news that it still does not prove his pet theory?

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow_at_imail.org
(801) 408-8111




> -----Original Message-----
> From: Patrick Burns [mailto:pburns@pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help_at_r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
>
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow_at_imail.org> wrote:
> >
> >>> -----Original Message-----
> >>> From: r-help-bounces_at_r-project.org
> >>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help_at_r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>>
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not far
> >>> from optimal.
> >>>
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >>>
> >> Pat,
> >>
> >> I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >>
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >>
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >>
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >>
> >> 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >>
> >>
> >
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Fri 06 Jun 2008 - 19:18:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 19:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive