Re: [Rd] Connections to https: URLs -- IE expert help needed

From: Duncan Temple Lang <duncan_at_wald.ucdavis.edu>
Date: Sat 06 Jan 2007 - 22:54:02 GMT

Prof Brian Ripley wrote:
> On Mon, 1 Jan 2007, Duncan Temple Lang wrote:
>

>> Kurt Hornik wrote:
>>>>>>>> Duncan Temple Lang writes:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>> Prof Brian Ripley wrote:
>>>>> I've added to R-devel the ability to use download.file() and url() to
>>>>> https: URLs, *only* if --internet2 is used on Windows.
>>>>>
>>>>> This uses the Internet Explorer internals, and only works if the
>>>>> certificate is accepted (so e.g. does not work for
>>>>> https://svn.r-project.org).
>>>>>
>>>>> Now I use IE (and Windows for that matter) only when really necessary, and
>>>>> Firefox has simple ways to permanently accept non-verifiable certificates.
>>>>> I would be grateful if someone who is much more familiar with IE could
>>>>> write a note explaining how to deal with this that we could add to the
>>>>> rw-FAQ.
>>>>>
>>>>> To forestall the inevitable question: there are no plans to add https:
>>>>> support on any other platform, but it is something that would make a nice
>>>>> project for a user contribution.  The current internal code is based on
>>>>> likxml2, and that AFAICS still does not have https: support.
>>>>>
>>>> Generally (i.e. not in particular response to Brian but related to
>>>> this thread)
>>> With a similar disclaimer: Brian's efforts were triggered by me asking
>>> how to use url() to read R's mailing list archive files, such as
>>>
>>>   https://stat.ethz.ch/pipermail/r-help/2007-January.txt.gz
>>>
>>> directly into R.  Turns out we cannot ... which, in a way, is a shame
>>> ("R cannot read its own web pages") :-(
>> Indeed, it is a shame.  Although, when I process mail messages,
>> I use Perl's very rich collection of modules for processing
>> mail in so many different formats. And then I use RSPerl
>> to control this and get the data into R pretty quickly.
>> So we can do it in R and probably the delegation to
>> mail-processing software is a good given the number of special
>> cases, etc.
>>
>> And even if we had HTTPs in R, we would still want to deal with
>> the certificate on that page, which gets us to more details.
>> Which is the reason I think leaving things to libcurl,
>> libwww, etc. will be best as they continue to evolve
>> to handle new protocols and settings.

>
> The issue here is the same as it ever was, that of event-loops and not
> blocking the R process. I think that is where the missing extensibility
> is, and it has been raised for at least 6 years now.

Of course, that is one area where extensibility is needed. Attempts have been made to address this generaly over the last 6 years,
but the architecture of and the focus on the current numerous R front-ends is not necessarily ideal for trying to solve this properly.

But your sentence suggests that the extensibility of the connection API is not an issue. And we don't agree on that. I think the two issues of extensibility are relevant. Blocking is important, but not being able to explore or add new facilities is fundamental and I believe of immense importance. Extensibility of the R engine at the system level rather than in the interpreted language is a major impediment to the evolution of R, IMHO.

>
> If I try to get that example URI with RCurl it
>
> 1) blocks the R process for a long time.
> 2) fails to retrieve the URI as it is unable to handle the certificate.

2) is, as you would put it, "user error" ;-) You need to tell libcurl what options you want in the request. Telling it whether to ignore certificates, where the certificates are, etc. are query-specific options.

>
> Can you please point us to an extension package that behaves better?
>

Well, as regards point 1), libcurl does have facilities for non-blocking calls and so does RCurl via the multi_ interface of libcurl and the function getURIAsynchronous() in RCurl and the lower-level functions.
  And one could also merge the basic libcurl interface into our select calls. I seem to recall libwww has features we also can manually integrate into our event loop.

The key thing I am trying to get across is that if we are going to include these things into R and we have to do things manually, then we should try to integrate them in an evolvable, extensible manner that leverages libraries that do things properly.

> [When Kurt first sent me the example, I was surprised that wget handled
> it. I then checked, and wget < 1.10 does not check certificates at all.]
>



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Jan 07 09:56:50 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 07 Jan 2007 - 01:31:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.