Re: [R] URL Scan

From: Barry Rowlingson <>
Date: Mon, 18 Apr 2011 10:26:25 +0100

On Sun, Apr 17, 2011 at 11:56 PM, jmsc <> wrote:
> The site does not require a login/password. Another way to access the first
> site would be to go to the second site, click Connecticut, click Canterbury,
> CT, enter the online database, click search under Query by Location with
> nothing in the search fields, and click the first property. Viewing the
> frame source on this page redirects to the second site.

 it doesn't require a login/pass, but it uses session cookies to simulate a logged-in user (there's even a log out button that clears the session).

> Also, could you direct me to or give me some instructions on scanning from
> sites that do require a login/password? Thanks.

 I had a quick look for R-help posts on this ( RSiteSearch("cookies"), RSiteSearch("session") etc) but didn't find much. You probably want to install RCurl and look at the examples.

 Generally what happens is that a successful login, or in this case just visiting the database front page, causes the web server to send back a 'cookie' with a long ID number in it. For every further access to that web site your browser includes the cookie. The server then looks up the ID, goes 'yup, this is a valid session', and sends you the page you want. If the cookie isn't there, or the ID isn't valid (and the ID numbers are big enough to make guessing impractical), then you get the default page.

Barry mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon 18 Apr 2011 - 09:32:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Apr 2011 - 19:40:31 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive