[R] "Special" characters in URI

From: Gorjanc Gregor <Gregor.Gorjanc_at_bfro.uni-lj.si>
Date: Tue 03 May 2005 - 09:46:22 EST


Hello!

I am crossposting this to R-help and BioC, since it is relevant to both groups.

I wrote a wrapper for Entrez search utility (link for this is provided bellow), which can add some new search functionality to existing code in Bioconductor's package 'annotate'*.  

http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

Entrez search utuility returns a XML document but I have a problem to use URI to retrieve that file, since URI can also contain characters, which should not be there according to

http://www.faqs.org/rfcs/rfc2396.html

I encountered problems with "[" and "]" as well as with space characters. However there might also be a problem with others i.e. reserved characters in URI syntax.

My R example is:

R> library("annotate")
Loading required package: Biobase
Loading required package: tools
Welcome to Bioconductor

         Vignettes contain introductory material.  To view, 
         simply type: openVignette() 
         For details on reading vignettes, see
         the openVignette help page.
R> library(XML)
R> tmp$term <- "gorjanc g[au]"

R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" R> tmp
$term

[1] "gorjanc g[au]"

$URL

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) :

        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]

# so I have a problem with space and [ and ] # let's reduce a problem to just space or [] to be sure R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) :

        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) :

        error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]

# now show that it works fine without special chars R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
$doc
$file

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc"

$version

[1] "1.0"

$children

...

# now show a workaround for space
tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE)
$doc
$file

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g"

$version

[1] "1.0"

$children

...

As can be seen from above there is a possibility to handle this special characters and I wonder if this has already been done somewhere? If not I thought on a function fixURLchar, which would replace reserved characters with ther escaped sequences. Any comments, pointers, ... ?

from = c(" ", "\"", ",", "#"),
to = c("%20", "%22", "%2c", "%23"))

*When I'll solve problem I will send my code to 'annotate' maintainer and he can include it at his will in a package.

Lep pozdrav / With regards,

    Gregor Gorjanc



University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe

"One must learn by doing the thing; for though you think you know it,  you have no certainty until you try." Sophocles ~ 450 B.C.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue May 03 09:50:10 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:32 EST