Re: [Rd] Comments in the DESCRIPTION file

From: Hervé Pagès <hpages_at_fhcrc.org>
Date: Fri, 07 Dec 2012 17:10:09 -0800

Hi Simon,

On 12/06/2012 05:59 PM, Simon Urbanek wrote:
> On Dec 6, 2012, at 8:36 PM, Hervé Pagès wrote:
>
>> On 12/06/2012 04:53 PM, William Dunlap wrote:
>>> Why not just use some tag that R doesn't already use, say "Comment:", instead
>>> of a #? If you allow # in position one of a line to mean a comment then people
>>> may expect # to be used as a comment anywhere on a line.
>>
>> I would stick to whatever the DCF spec say, if there is such thing.
>> If the spec says # on position 1 means a comment then I think read.dcf()
>> should do that. Then the function can be used to read any DCF file,
>> not just DESCRIPTION files.
>>
>
> DCF itself doesn't define the meaning of # -- it only defines that no field name is allowed to start with #. In fact the same document says that lines starting with # are not permitted in general DCF files -- they are only permitted in Debian's source package control files. That leaves the status of # as comments somewhat confusing. My interpretation would be that generic DCF doesn't allow # but specific formats derived from DCF may choose to interpret it that way. In either case the current behavior of read.dcf() definitely satisfies the DCF definition.

Not if the definition says that no field name is allowed to start with #:

   > read.dcf("toto.dcf")

        #Package Version
   [1,] "toto" "0.0.0"

> As both Brian and Bill pointed out, the proper way to do that is to define a data field with data/value as the comment.

which maybe works OK for inserting comments in DESCRIPTION files, but not so well for inserting inter-record comments in DCF files with multiple records.

In Bioconductor we maintain a big DCF file that we use to automatically re-generate a collection of annotation packages at each release. The file looks like:

# Annotation packages for Human

Package: hcg110.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

Package: hgfocus.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

# Annotation packages for Mouse

Package: mgu74a.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

Package: mgu74av2.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

The problem if you put those comments in key/value pairs is that it contaminates the output of read.dcf() with fake records:

 > read.dcf("toto.dcf")

      Note                            Package       Version PkgTemplate
[1,] "Annotation packages for Human" NA            NA      NA
[2,] NA                              "hcg110.db"   "2.8.0" "NCBICHIP.DB"
[3,] NA                              "hgfocus.db"  "2.8.0" "NCBICHIP.DB"
[4,] "Annotation packages for Mouse" NA            NA      NA
[5,] NA                              "mgu74a.db"   "2.8.0" "NCBICHIP.DB"
[6,] NA                              "mgu74av2.db" "2.8.0" "NCBICHIP.DB"

The file really has 4 records of data and it'd be good to be able to add inter-record comments without altering the number of records.

This is the reason why we use a "comment aware" version of read.dcf().

I can see why maybe you wouldn't like having people start using # to insert comment lines in their DESCRIPTION file and I agree that it should probably be discouraged. So maybe support for # comments could be made optional in read.dcf() thru an extra arg, and would be disabled by default?

Thanks,
H.

>
> Cheers,
> Simon
>
>
>
>> Cheers,
>> H.
>>
>>>
>>> (It may also mess up some dcf parsing code that I've written - it checks that lines
>>> after tagged lines are either empty, the start of a new description, or start with a space,
>>> a continuation of the previous line.)
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces_at_r-project.org [mailto:r-devel-bounces_at_r-project.org] On Behalf
>>>> Of Hervé Pagès
>>>> Sent: Thursday, December 06, 2012 3:47 PM
>>>> To: Duncan Murdoch
>>>> Cc: christophe.genolini_at_u-paris10.fr; r-devel_at_r-project.org; Christophe Genolini
>>>> Subject: Re: [Rd] Comments in the DESCRIPTION file
>>>>
>>>>
>>>>
>>>> On 12/06/2012 03:41 PM, Hervé Pagès wrote:
>>>>> Hi,
>>>>>
>>>>> Wouldn't be hard to patch read.dcf() though.
>>>>>
>>>>> FWIW here's the "comment aware" version of read.dcf() I've been using
>>>>> for years:
>>>>>
>>>>> .removeCommentLines <- function(infile=stdin(), outfile=stdout())
>>>>> {
>>>>> if (is.character(infile)) {
>>>>> infile <- file(infile, "r")
>>>>> on.exit(close(infile))
>>>>> }
>>>>> if (is.character(outfile)) {
>>>>> outfile <- file(outfile, "w")
>>>>> on.exit({close(infile); close(outfile)})
>>>>> }
>>>>> while (TRUE) {
>>>>> lines <- readLines(infile, n=25000L)
>>>>> if (length(lines) == 0L)
>>>>> return()
>>>>> keep_it <- substr(lines, 1L, 1L) != "#"
>>>>> writeLines(lines[keep_it], outfile)
>>>>> }
>>>>> }
>>>>>
>>>>> read.dcf2 <- function(file, ...)
>>>>> {
>>>>> clean_file <- file.path(tempdir(), "clean.dcf")
>>>>
>>>> mmh, would certainly be better to just use tempfile() here.
>>>>
>>>> H.
>>>>
>>>>> .removeCommentLines(file, clean_file)
>>>>> on.exit(file.remove(clean_file))
>>>>> read.dcf(clean_file, ...)
>>>>> }
>>>>>
>>>>> Cheers,
>>>>> H.
>>>>>
>>>>> On 11/07/2012 01:53 AM, Duncan Murdoch wrote:
>>>>>> On 12-11-07 4:26 AM, Christophe Genolini wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Is it possible to add comments in the DESCRIPTION file?
>>>>>>
>>>>>>
>>>>>> The read.dcf function is used to read the DESCRIPTION file, and it
>>>>>> doesn't support comments. (The current Debian control format
>>>>>> description does appear to support comments with leading # markers, but
>>>>>> R's read.dcf function doesn't support these.)
>>>>>>
>>>>>> You could probably get away with something like
>>>>>>
>>>>>> #: this is a comment
>>>>>>
>>>>>> since unrecognized fields are ignored, but I think this fact is
>>>>>> undocumented so I would say it's safer to assume that comments are not
>>>>>> supported.
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel_at_r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages_at_fhcrc.org
>>>> Phone: (206) 667-5791
>>>> Fax: (206) 667-1319
>>>>
>>>> ______________________________________________
>>>> R-devel_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages_at_fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages_at_fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 08 Dec 2012 - 01:21:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Dec 2012 - 11:52:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive