Re: [Rd] xmlParseDoc parser errors

From: Duncan Temple Lang <dtemplelang_at_ucdavis.edu>
Date: Fri, 16 Nov 2012 07:29:58 -0800

On 11/16/12 6:10 AM, bryan rasmussen wrote:
> Hi,
>
> I have some XML files that have a processing instruction directly
> after the XML declaration
>
> when I do
> kgroup.reading <- character(0)
> for (file in file_list){kgroup.reading <-
> xmlParseDoc(file.path("c:","projects","respositories","dk","004",file))}
>
> I get the error
> file name :1: parser error : Start tag expected, '<' not found

That particular error message is most commonly associated with trying to treat the first argument (the file name) as the XML content itself because the file does not actually exist. When you are not reading from XML content as an existing character vector, you can use asText = FALSE in the call to xmlParseDoc() or xmlParse() to avoid the function attempting to treat the file name as content, e.g.

     xmlParseDoc("~/pis.xml", asText = FALSE)

Then you would get a message such

    I/O warning : failed to load external entity "/Users/duncan/pis.xml"

>
> When I remove the processing instruction and try to load it again I do
> not get the parser error.
>
> This is of course understandable because of
>
> [Definition: Processing instructions (PIs) allow documents to contain
> instructions for applications.]
> Processing Instructions
> [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
> [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
>
> PIs are not part of the document's character data, but MUST be passed
> through to the application. The PI begins with a target (PITarget)
> used to identify the application to which the instruction is directed.
> The target names " XML ", " xml ", and so on are reserved for
> standardization in this or future versions of this specification. The
> XML Notation mechanism may be used for formal declaration of PI
> targets. Parameter entity references MUST NOT be recognized within
> processing instructions.
>
> from the specification, on the other hand it does not say that it is
> never allowed for any PI given that they (the W3C) are planning to use
> it for 'standardization in this or future versions of this
> specification'
>
> Unfortunately the people who made the xml-model processing instruction
>
> http://www.w3.org/TR/2012/NOTE-xml-model-20121009/#the-xml-model-processing-instruction
>
> I guess decided they had the right to standardize a processing
> instruction name.
>
> Is there any way to get around this problem?

Firstly, xmlParseDoc() _will_ parse a processing instruction. The error is not due to its presence.

You will get a warning because the target of the processing instruction contains the string xml. Since you are using xmlParseDoc(), you can specify that parser should not emit warnings via

  xmlParse("filename", NOWARNING)

>
> Also When I do the following:
>
> t <- '<?xml version="1.0" encoding="utf-8"?><?xml-model
> href="urn:publicid:-:Thomson+Information+AS:DTD+AFGRDOK:DK"?><t></t>'
>> xmlParseDoc(t)
> I get the parser warning
>
> <?xml version="1.0" encoding="utf-8"?><?xml-model
> href="urn:publicid:-:Thomson+Information+AS:DTD+AFGRDOK:DK"?><t></t>:1:
> parser warning : xmlParsePITarget: invalid name prefix 'xml'
> <?xml version="1.0" encoding="utf-8"?><?xml-model href="urn:publicid:-:Thomson+I
>
> Why do I get it as a parser error when I load the document, but a
> parser warning when I load it as a string?
> Anyway just to get it as a warning when I load the document?

Yes, it just works. I believe that somehow you didn't have the file initially and by editing it you caused it to be available.

>
> Thanks,
> Bryan Rasmussen
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 16 Nov 2012 - 15:35:05 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 18 Nov 2012 - 14:00:54 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive