Re: [Rd] Connections to https: URLs -- IE expert help needed

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 05 Jan 2007 - 10:00:09 GMT

On Mon, 1 Jan 2007, Duncan Temple Lang wrote:

> Kurt Hornik wrote:

>>>>>>> Duncan Temple Lang writes:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>
>>> Prof Brian Ripley wrote:
>>>> I've added to R-devel the ability to use download.file() and url() to
>>>> https: URLs, *only* if --internet2 is used on Windows.
>>>>
>>>> This uses the Internet Explorer internals, and only works if the
>>>> certificate is accepted (so e.g. does not work for
>>>> https://svn.r-project.org).
>>>>
>>>> Now I use IE (and Windows for that matter) only when really necessary, and
>>>> Firefox has simple ways to permanently accept non-verifiable certificates.
>>>> I would be grateful if someone who is much more familiar with IE could
>>>> write a note explaining how to deal with this that we could add to the
>>>> rw-FAQ.
>>>>
>>>> To forestall the inevitable question: there are no plans to add https:
>>>> support on any other platform, but it is something that would make a nice
>>>> project for a user contribution. The current internal code is based on
>>>> likxml2, and that AFAICS still does not have https: support.
>>>>
>>
>>> Generally (i.e. not in particular response to Brian but related to
>>> this thread)
>>
>> With a similar disclaimer: Brian's efforts were triggered by me asking
>> how to use url() to read R's mailing list archive files, such as
>>
>> https://stat.ethz.ch/pipermail/r-help/2007-January.txt.gz
>>
>> directly into R. Turns out we cannot ... which, in a way, is a shame
>> ("R cannot read its own web pages") :-(
>
> Indeed, it is a shame.  Although, when I process mail messages,
> I use Perl's very rich collection of modules for processing
> mail in so many different formats. And then I use RSPerl
> to control this and get the data into R pretty quickly.
> So we can do it in R and probably the delegation to
> mail-processing software is a good given the number of special
> cases, etc.
>
> And even if we had HTTPs in R, we would still want to deal with
> the certificate on that page, which gets us to more details.
> Which is the reason I think leaving things to libcurl,
> libwww, etc. will be best as they continue to evolve
> to handle new protocols and settings.

The issue here is the same as it ever was, that of event-loops and not blocking the R process. I think that is where the missing extensibility is, and it has been raised for at least 6 years now.

If I try to get that example URI with RCurl it

  1. blocks the R process for a long time.
  2. fails to retrieve the URI as it is unable to handle the certificate.

Can you please point us to an extension package that behaves better?

[When Kurt first sent me the example, I was surprised that wget handled it. I then checked, and wget < 1.10 does not check certificates at all.]

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat Jan 06 03:31:33 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 23:31:03 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.