Re: [Rd] CRAN policies

From: Spencer Graves <spencer.graves_at_prodsyse.com>
Date: Sat, 31 Mar 2012 10:56:04 -0700

ink1">ink1">Hi, Ted:

       Thank you for the most eloquent and complete description of the >name="0836qlink2"> name="0836qlink2"><a name="0836qlink2">ame="0836qlink2">name="0836qlink2"> name="0836qlink2"><a nalink2">me="<a name="0836qlink2">0836ame="0836qlink2">qlinname="0836qlink2">k2"> name="0836qlink2">prob<a name="0836qlink2">link2">lem <a name="0836qlink2">ame="0836qlink2">ame=name="0836qlink2">"083 name="0836qlink2">6qli<a name="0836qlink2">nk2"ame=name="0836qlink2">"083 name="0836qlink2">6qli<a name="0836qlink2">nk2"ame="0836qlink2">>>and opportunity I've seen in a while.

       Might you have time to review the Wikipedia articles on "Package development process" and "Software repository" (http://en.wikipedia.org/wiki/Package_development_process; http://en.wikipedia.org/wiki/Software_repository) and share with me your reactions?

       I wrote the "Package development process" article and part of the "Software repository" article, because the R package development process is superior to similar processes I've seen for other languages. However, I'm not a leading researcher on these issues, and your comments suggest that you know far more than I about this. Humanity might benefit from your review of these articles. (If you have any changes you might like to see, please make them or ask me to make them. Contributing to Wikipedia can be a very high leverage activity, as witnessed by the fact that the Wikipedia article on SOPA received a million views between the US holidays of Thanksgiving and Christmas last year.)

       Thanks again,
       Spencer


On 3/31/2012 8:29 AM, Ted Byers wrote:
>> -----Original Message-----
>> From: r-devel-bounces@r-project.org [mailto:r-devel-bounces@r-project.org]
>> On Behalf Of Paul Gilbert
>> Sent: March-31-12 9:57 AM
>> To: Mark.Bravington_at_csiro.au
>> Cc: r-devel_at_stat.math.ethz.ch
>> Subject: Re: [Rd] CRAN policies
>>
> Greetings all
>
>> Mark
>>
>> I would like to clarify two specific points.
>>
>> On 12-03-31 04:41 AM, Mark.Bravington_at_csiro.au wrote:
>> > ...
>>> Someone has subsequently decided that code should look a certain way,
>>> and has added a check that isn't in the language itself-- but they
> haven't
>> thought of everything, and of course they never could.
>>
>> There is a large overlap between people writing the checks and people
> writing
>> the interpreter. Even though your code may have been working, if your
>> understanding of the language definition is not consistent with that of
> the
>> people writing the interpreter, there is no guarantee that it will
> continue to
>> work, and in some cases the way in which it fails could be that it
> produces
>> spurious results. I am inclined to think of code checks as an additional
> way to be
>> sure my understanding of the R languag
e is close to that of the people
> writing
>> the interpreter.
>>
>>> It depends on how Notes are being interpreted, which from this thread is
> no
>> longer clear.

>> > The R-core line used to be "Notes are just notes" but now we s
eem to
> have
>> "significant Notes" and ...
>>

>> My understanding, and I think that of a few other people, w
as incorrect,
> in that
>> I thought some notes were intended always to remain as note
s, and others ="0836qlink8"> name="0836qlink8"><a name="0836qlink8">>> class="quotelev2">>> were more serious in that they would eventually become warnings or errors.
> I
>> think Uwe addressed this misunderstanding by saying that all notes are

>> intended to become warnings or errors. In several cases the reason they
> are

>> not yet warnings or errors is that the checks are not yet good enough,
> they

>> produce too many false positives.

>> So, this means that it is very important for us to look at the notes and
> to point
>> out the reasons for the false positives, otherwise they may become
> warnings or

>> errors without being recognised as such.
>>

> I left the above intact as it nicely illustrates what much of this

> discussion reminds me of. Let me illustrate with the question of software ink11"><a name="0836qlink11">name="0836qlink11"><a name="0836qlink11">name="0836qlink11"></em>
class="q<a name="0836qlink10">uotename="0836qlink10">lev1<a name="0836qlink10">">&gname="0836qlink10">t; d<a name="0836qlink10">evelnamenk8">="08name="0836qlink7">36ql name="0836qlink7">ink1<a name="0836qlink7">0">oame="0836qlink7">pment in one of my favourite languages: C++.
>

> The first issue to consider is, "What is the language definition and who

> decides?" Believe it or not, there are two answers from two very different
k12">>name="0836qlink12"></a><a name="0836qlink12">>name="0836qlink12"></a><a name="0836qlink12">>m>r>0836qlink11">m clname="0836qlink11">ass=<a name="0836qlink11">"quoname="0836qlink11">tele<a name="0836qlink11">v1">name="0836qlink10">> perspectives. The first is favoured by language lawyers, who point to the >>name="0836qlink13"></a><a name="0836qlink13">>m>r>m class="quotelev1">ame="0836qlink13">&gt; name="0836qlink13"> ANSame="0836qlink13">I st name="0836qlink13">andard, and who will argue incessantly about the finest of details. me="0836qlink14">>name="0836qlink14"></a><a name="0836qlink14">>m>r>m class="quotelek14">v1">name="0836qlink14">&gt;<a name="0836qlink14"> Butname="0836qlink14"> to <a name="0836qlink14">understand this, you have to understand what ANSI is: it is an
> industry organization and to construct the standard, they have industry

> representatives gathered, divided up into subcommittees each of which is

> charged with defining the language. And of course everyone knows that,

> being human, they can get it wrong, and thus ANSI standards evolve ever so name="0836qlink15"></a><a name="0836qlink15">>name="0836qlink15"></a><a name="0836qlink15">>m>r>m class="quotelev1"><a name="0836qlink12">>;name="0836qlink11"> slo<a name="0836qlink11">wly 36qlink11">name<a name="0836qlink11">="08name="0836qlink10">36ql<a name="0836qlink10">ink1name="0836qlink12">2">t<a name="0836qlink12">hrough time. To my mind, that is not much different from what
> R/core or Cran are involved in. But the other answer comes from the

> perspective of a professional software developer, and that is, that the

> final arbiter of what the language is is your compiler. If you want to get name="0836qlink16"></a><a name="0836qlink16">>name="0836qlink16"></a><a name="0836qlink16">>m>r>m class="quo0836qlink14">telename="0836qlink14">v1"><a name="0836qlink13">>;name="0836qlink12"> pro<a name="0836qlink12">ductname="080836qlink11">36qlname="0836qlink11">ink1<a name="0836qlink13">3"> name="0836qlink13">out the door, it doesn't matter if the standard says 'X' if the
> compiler doesn't support it, or worse, implements it incorrectly. Most

> compilers have warnings and errors, and I like the idea of extending that to name="0836qlink17"></a><a name="0836qlink17">>name="0836qlink17"></a><a name="0836qlink17">>m>r>m class="quome="0836qlink15">telename="0836qlink15">v1"><a name="0836qlink14">>;name="0836qlink13"> hav<a name="0836qlink13">e no="0836qlink13">name<a name="0836qlink13">="08name="0836qlink12">36ql<a name="0836qlink12">ink1name="0836qlink14">4">t<a name="0836qlin<a name="0836qlink13">k14"name="0836qlink13">>es, but that is a matter of taste vs pragmatism. I know many
> software developers that choose to ig
nore warnings and fix only the errors. name="0836qlink18"></a><a name="0836qlink18">>name="0836qlink18"></a><a name="0836qlink18">>m>r>m class="quo0836qlink16">telename="0836qlink16">v1"><a name="0836qlink15">>;name="0836qlink14"> The<a name="0836qlink14">ir r="0836qlink14">name<a name="0836qlink14">="08name="0836qlink13">36ql<a name="0836qlink13">ink1name="0836qlink15">5">a<a name="0836qlink15">tionale is that it takes time they don't have to fix the warnings >>name="0836qlink19"></a><a name="0836qlink19">>m>r>m class="quotele9">v1"><a name="0836qlink19">>;name="0836qlink18"> too<a name="0836qlink18">. Aname="0836ql="0836qlink16">ink1<a name="0836qlink16">7">nname="0836qlink16">d I know others who treat all warnings as errors unless they have 0836qlink20">>name="0836qlink20"></a><a name="0836qlink20">>m>r>m clqlink20">ass=name="0836qlink20">"quo<a name="0836qlink20">telename="0836qlink20">v1"><a naink17">me="<a name="0836qlink17">0836name="0836qlink17">qlin<a name="0836qlink17">k20"name="0836qlink17">>> discovered that there is a compiler bug that generates spurious warnings of qlink21">>name="0836qlink21"></a><a name="0836qlink21">>m>r>m class="quotelev1">><a name="0836qlink19"> a pname="0836qlink20">arti<a name="0836qlink20">culaname="0836qlink16">="08<a name="0836qlink16">36qlname="0836qlink16">ink2<a name="0836qlink16">1">r kind (in which case that specific warning can usually be turned 0836qlink22">>name="0836qlink22"></a><a name="0836qlink22">>m>r>m cl0836qlink17">ass=name="0836qlink17">"quo<a name="0836qlink17">telename="0836qlink17">v1"><a name="0836qlin<a name="0836qlink18">k17"name="0836qlink18">>> off). Guess which group has lower bug rates on average. I tend to fall in qlink23">>name="0836qlink23"></a><a name="0836qlink23">>m>r>m class="quotelev1">><a name="0836qlink18"> thename="0836qlink18"> lat<a name="0836qlink18">ter name="0836qlink18">group, having observed that with many of these things, you either qlink24">>name="0836qlink24"></a><a name="0836qlink24">>name="0836qlink24"></a></em>
class="quotename="0836qlink18">lev1<a name="0836qlink18">">&gname="0836qlink18">t; f<a name="0836qlink19">ix tname="0836qlink19">hem now or you will fix them, at greater cost, later.
>

> The second issue to consider is, "What constitutes good code, and what is </a><a name="0836qlink25">>name="0836qlink25"></a><a name="0836qlink25">>m>r>m class="quotelev1">> necessary to produce it?" That I won't answer beyond saying, 'whatever
> works.' That is because it is ultimately defined by the end users'

> requirements. that is why we have software engineers who specialize in

> requirements engineering. these are bright people who translate the wish ="0836qlink26"></a><a name="0836qlink26">>name="0836qlink26"></a><a name="0836qlink26">>m>r>m class="quotele0836qlink19">v1">name="0836qlink19">&gt;<a name="0836qlink18"> lisname="0836qlink17">ts o<a name="0836qlink17">f nqlink20">o name="0836qlink20">name<a name="0836qlink19">="08name="0836qlink18">36ql<a name="0836qlink18">ink1name="0836qlink20">8">n-technical users into functional and environmental requirements, k27">>name="0836qlink27"></a><a name="0836qlink27">>m>r>m class="quotelev1">0836qlink22">>;name="0836qlink22"> tha<a name="0836qlink21">t thname="08="0836qlink23">36ql<a name="0836qlink23">ink2name="0836qlink23">0">e<a name="0836qlink23"> rest of us can code to. But before we begin coding, we have QA
> specialists that design a variety of test
s from finely focussed unit tests k28">>name="0836qlink28"></a><a name="0836qlink28">>name="0836qlink28"></a><a name="0836qlink28">>m>r>me="0836qlink21">m clname="0836qlink21">ass=<a name="0836qlink21">"quoname="0836qlink21">tele<a name="0836qlink21">v1">name="0836qlink2ink21">0">> through integration tests to broadly focussed usability tests, ending with a >>name="0836qlink29"></a><a name="0836qlink29">>m>r>m class="quotelev1">>me="0836qlink25"> suiname="0836qlink25">te o<a name="0836qlink25">f te>namename="0836qlink24">="08<a name="0836qlink24">36qlname="0836qlink25">ink2<a name="0836qlink25">4">sts that basically confirm that the requirements defined for the 0836qlink30">>name="0836qlink30"></a><a name="0836qlink30">>m>r>m class="quotele7">v1"><a name="0836qlink27">>;name="0836qlink27"> pro<a name="0836qlink27">ductname="08<a name="0836qlink26">36qlname="0836qlink26">ink2<a name="0836qlink26">6"> name="0836qlink26">are satisfied. Standard practice in good software houses is that qlink31">>name="0836qlink31"></a><a name="0836qlink31">>m>r>m class="quotelev1">&gt;<a name="0836qlink27"> notname="0836qlink27">hing<a name="0836qlink28"> getname="08ink22">36ql<a name="0836qlink22">ink2name="0836qlink22">7">s added to the codebase unless the entire code base, with the new 0836qlink32">>name="0836qlink32"></a><a name="0836qlink32">>m>r>m clqlink23">ass=name="0836qlink23">"quo<a name="0836qlink23">telename="0836qlink23">v1"><a name="0836qlink22"36qlink24">>> or revised code, compiles and passes the entire test suite. When new code qlink33">>name="0836qlink33"></a><a name="0836qlink33">>m>r>m class="quotelev1">><a name="0836qlink26"> strname="0836qlink26">esse<a name="0836qlink26">s thname="0836qlink25">e codebase in such a way as to trigger a failure in the existing 0836qlink34">>name="0836qlink34"></a><a name="0836qlink34">>m>r>m cl0836qlink27">ass=name="0836qlink28">"quo<a name="0836qlink29">telename="0836qlink28">v1"><a name="0836qlink28">> code, then when it is diagnosed and fixed, new tests are designed and added qlink35">>name="0836qlink35"></a><a name="0836qlink35">>m>r>m class="quotelev1">0836qlink30">>;name="0836qlink29"> to <a name="0836qlink29">the name="0836qlink29">test suite codebase (which has the same requirement of everything
> building and passing all tests). of cour
se, some do this better than others </a><a name="0836qlink36">>name="0836qlink36"></a><a name="0836qlink36">>m>r>m class="quotelev1">> as there are reasons NASA may spend $5 per line of code while many industry >name="0836qlink37"></a><a name="0836qlink37">>name="0836qlink37"></a><a name="0836qlink37">>m>r>m class="quotelev1">> players spend $0.05 per line of code.
>

> It is sheer folly for anyone to suggest that reliance on warnings and

> errors, even extending this to notes, ensures good code. At best, these are k38">>name="0836qlink38"></a><a name="0836qlink38">>name="0836qlink38"></a><a name="0836qlink38">>m>r>0836qlink25">m clname="0836qlink25">ass=<a name="0836qlink25">"quoname="0836qlink25">tele<a name="0836qlink25">v1">name="0836qlink2name="0836qlink28">5">> necessary to support development of good code, but they do not come close to >>name="0836qlink39"></a><a name="0836qlink39">>m>r>m class="quotelev1">>me="0836qlink30"> beiname="0836qlink30">ng s<a name="0836qlink30">uffik30">namename="0836qlink30">="08<a name="0836qlink30">36qlname="0836qlink30">ink3<a name="0836qlink31">0">cient. it is trivial to find examples of C code, for computing a 0836qlink40">>name="0836qlink40"></a><a name="0836qlink40">>m>r>m class="quotele3">v1"><a name="0836qlink33">>;name="0836qlink33"> mea<a name="0836qlink33">n, vname="08<a name="0836qlink32">36qlname="0836qlink33">ink3<a name="0836qlink34">3">aname="0836qlink34">riance and standard deviation, that is correct both WRT the ANSI
> standard and the compiler, and yet it is
really bad code (look for single name="0836qlink41"></a><a name="0836qlink41">>name="0836qlink41"></a><a name="0836qlink41">>m>r>m class="quotelev1">>name="0836qlink22"> pas<a name="0836qlink22">s alname="0836qlink23">gorithms, and you'll find one of the most commonly recommended
> algorithms is also one of the worst, In terms of accuracy under
some inputs, name="0836qlink42"></a><a name="0836qlink42">>name="0836qlink42"></a><a name="0836qlink42">>m>r>m class="quo0836qlink28">telename="0836qlink28">v1"><a name="0836qlink26">>;name="0836qlink23"> and<a name="0836qlink23"> yetname="0836qlink24"> name="0836qlink28">an infrequently recommended algorithm is one of the best both in
> terms of ease of implementation, speed and accuracy). And you
will still >name="0836qlink43"></a><a name="0836qlink43">>name="0836qlink43"></a><a name="0836qlink43">>m>r>m class="quoname="0836qlink29">tele<a name="0836qlink29">v1">name="0836qlink27">&gt;<a name="0836qlink24"> finname="0836qlink24">d go<name="0836qlink29">a na<a name="0836qlink29">me="name="0836qlink29">0836<a name="0836qlink29">qlinname="0836qlink29">k25"<a name="0836qlin9">k29"<a name="0836qlink31">>>od mathematicians defending the bad code by saying it is
> mathematically correct, but this is because they do not understand the
> consequences of finite precision arithmetic and rounding error.
>
> I would observe, as an outsider, that what CRAN is apparently doing is
> primarily focussed on the first issue above, but going beyond what the R
> interpreter does to get a better handle on a system of warnings with an
> extension to notes. The notes question I can understand as a pragmatic
> matter. If I were assigned to do the same sort of thing, I would probably
k44">>name="0836qlink44"></a><a name="0836qlink44">>name="0836qlink44"></a><a name="0836qlink44">>m>r>me="0836qlink30">m clname="0836qlink30">ass=<a name="0836qlink30">"quoname="0836qlink30">tele<a name="0836qlink30">v1"0836qlink33">> name="0836qlink33">name<a name="0836qlink33">="08name="0836qlink33">36ql<a name="0836qlink33">ink2name="0836qlink31">8name="0836qlink33">">> do it in a similar manner, leaving some things as notes until both I and th</a>e >>name="0836qlink45"></a><a name="0836qlink45">>m>r>m class="quotelev1">0836qlink36">>;name="0836qlink36"> com<a name="0836qlink36">muniname="08="0836qlink35">36ql<a name="0836qlink35">ink3name="0836qlink36">6">t<a name="0836qlink37">y I serve develop a better understanding of the issues involved in 0836qlink46">>name="0836qlink46"></a><a name="0836qlink46">>m>r>m class="quotele9">v1"><a name="0836qlink39">>;name="0836qlink39"> the<a name="0836qlink39"> subname="08<a name="0836qlink39">36qlname="0836qlink40">ink3<a name="0836qlink40">9">jname="0836qlink39">ect of the notes to the point of being better able to either have qlink47">>name="0836qlink47"></a><a name="0836qlink47">>m>r>m class="quotelev1">&gt;<a name="0836qlink37"> thename="0836qlink39">m ev<a name="0836qlink39">olvename="083">36ql<a name="0836qlink33">ink4name="0836qlink33">0"> into more precisely defined warnings or die. I understand that 0836qlink48">>name="0836qlink48"></a><a name="0836qlink48">>m>r>m clqlink33">ass=name="0836qlink33">"quo<a name="0836qlink33">telename="0836qlink33">v1"><a name="0836qlink33"ink36">>> getting many of these things can get tedious and time consuming, but in fact qlink49">>name="0836qlink49"></a><a name="0836qlink49">>m>r>m class="quotelev1">me="0836qlink37">>;name="0836qlink37"> the<a name="0836qlink37">re iname="0836qlink37">s no other way for a community to analyse the issues involved and
> develop a good understanding of how best to handle them.
>

> But since CRAN does not appear to require
requirements engineering to be
> completed along with a comprehensive suite of QA tests, there is no possible name="0836qlink50"></a><a name="0836qlink50">>name="0836qlink50"></a><a name="0836qlink50">>m>r>m class="quotelev1">> way thename="0836qlink2>7">yname="0836qlink35"> can offer any guarantees or even recommendations that any package
> on CRAN is good quality. From the current reaction to mere notes, I can

> imagine the reaction that would arise should they ever decide to do so. It

> is very much up to the 'consumer' to
search CRAN, and evaluate each
> interesting package to ensure it works as advertised, and I have no doubt 0836qlink52">>name="0836qlink52"></a><a name="0836qlink52">>name="0836qlink52"></a></em>
class="quotelev1">&gt; that some are gems while others are best avoided.
>

&g
t; Just my $0.02 ...
>
> Cheers
>
> Ted
>
> ______________________________________________

> R-devel_at_r-project.org mailing list
>
https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San Josť, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 31 Mar 2012 - 17:59:16 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 01 Apr 2012 - 03:40:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive