Re: [Rd] Retrieving data from aspx pages

From: jose ramon mazaira <jramaza_at_gmail.com>
Date: Thu, 01 Nov 2012 00:38:35 +0100

Yes, you're right, I was wrong. I've been trying to use the postForm method in RCurl but it doesn't work. I'll check the source code in this package to see if I can improve it.

2012/10/31, Paul Gilbert <pgilbert902_at_gmail.com>:
> I must be really dense, I know RCurl provides a POST capability, but I
> don't see how this allows "interaction". Suppose the example actually
> worked, which it does not. (Unfortunately many of the examples in RCurl
> seem to be marked \dontrun{} or disabled with some if() condition.) When
> you post to a page like this you will often get something back that has
> a dynamically generated URI, and you will need to post more information
> to that page. But how do you find out the URI of that next dynamically
> generated page? Even when you know what you will need to post, you need
> the URI to do it. If RCurl provided interaction you would be able get
> the URI so you could post to the next page. Maybe you can do that, but I
> have not discover how. If you know how, I would appreciate a real
> working example.
>
> Paul
>
> On 12-10-31 12:14 PM, jose ramon mazaira wrote:
>> I'd like to make you note that I've discovered that package RCurl
>> already provides a utility that allows interaction via POST requests
>> with servers. In fact, the FAQ for RCurl contains specifically an
>> example with an aspx page:
>>
>> x = postForm("http://www.fas.usda.gov/psdonline/psdResult.aspx",
>> style = "post",
>> .params = list(visited="1",
>> lstGroup = "all",
>> lstCommodity="2631000",
>> lstAttribute="88",
>> lstCountry="**",
>> lstDate="2011",
>> lstColumn="Year",
>> lstOrder="Commodity%2FAttribute%2FCountry"))
>>
>> Check this link: http://www.omegahat.org/RCurl
>> However, I think that it would be more useful to automate the
>> interaction with servers retrieving automatically the name-value pairs
>> required by the server (parsing the page source code) instead of
>> examining in each web page the appropiate fields.
>>
>> 2012/10/30, Paul Gilbert <pgilbert902_at_gmail.com>:
>>> Jose
>>>
>>> As far as getting to the data, I think the best way to do this sort of
>>> thing would be if the site supports a SOAP or REST interface. When they
>>> don't (yet) then one is faced with clicking through some pages. Python
>>> or Java is one way to automate the process of clicking through the
>>> pages. I don't know how to do that in R, but would like to know if it is
>>> possible.
>>>
>>> But, I guess I was confused about the part you want to improve. What I
>>> have works fairly smoothly parsing and passing back JSON data, converted
>>> from a csv file, into R. The downside is that this approach requires
>>> more than R to be installed on the client machine. But if the object you
>>> get back is ASPX, then you either need to parse it directly, or convert
>>> it to JSON, or something else you can deal with. I suspect that will be
>>> fairly specific to a particular web site, but I don't really know enough
>>> about ASPX to be sure.
>>>
>>> Paul
>>>
>>> On 12-10-30 01:12 PM, jose ramon mazaira wrote:
>>>> Thanks for your interest, Paul.
>>>> I've checked the source code of TSjson and I've seen that what it does
>>>> is to call a Python script to retrieve the data. In fact, I've already
>>>> done this with Java using the URLConnection class and sending the
>>>> requested values to fill the form.
>>>> However, I think it would be more useful to open a connection with R
>>>> and to send the requested values within R, and not through an external
>>>> program.
>>>> The application I've designed, like yours, is also page-specific
>>>> (i.e., designed for
>>>> http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx),
>>>> but I think that our applications would be more powerful if they were
>>>> able to parse the name-value pairs generated from ASPX (or of any
>>>> other dynamically generated web page) and ask the user to select the
>>>> appropiate values.
>>>>
>>>> 2012/10/30, Paul Gilbert <pgilbert902_at_gmail.com>:
>>>>> I think RHTMLForms works if you have a single form, but I have not
>>>>> been
>>>>> able to see how to use it when you need to go through a sequence of
>>>>> dynamically generated forms (like you can do with Python mechanize).
>>>>>
>>>>> Paul
>>>>>
>>>>> On 12-10-30 09:08 AM, Gabriel Becker wrote:
>>>>>> I haven't used it extensively myself, and can't speak to it's current
>>>>>> state but on quick inspection RHTMLForms seems worth a look for what
>>>>>> you
>>>>>> want.
>>>>>>
>>>>>> http://www.omegahat.org/RHTMLForms/
>>>>>>
>>>>>> ~G
>>>>>>
>>>>>> On Tue, Oct 30, 2012 t 5:38 AM, Paul Gilbert <pgilbert902_at_gmail.com
>>>>>> <mailto:pgilbert902_at_gmail.com>> wrote:
>>>>>>
>>>>>> I don't know of an easy way to do this in R. I've been doing
>>>>>> something similar with python scripts called from R. If anyone
>>>>>> knows
>>>>>> how to do this with just R, I would appreciate hearing too.
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>>
>>>>>> On 12-10-29 04:11 PM, jose ramon mazaira wrote:
>>>>>>
>>>>>> Hi. I'm trying to write an application to retrieve
>>>>>> financial
>>>>>> data
>>>>>> (specially bonds data) from FINRA. The web page is served
>>>>>> dynamically
>>>>>> from an asp.net <http://asp.net> application:
>>>>>>
>>>>>>
>>>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx
>>>>>>
>>>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>
>>>>>>
>>>>>> I'd like to know if it's possible to fill dynamically the
>>>>>> web
>>>>>> page
>>>>>> form from R and, after filling it (with the issuer name),
>>>>>> retrieve the
>>>>>> web page, parse the data, and covert it to appropiate R
>>>>>> objects.
>>>>>> For example, suppose I want to search data for AT&T bonds.
>>>>>> I'd
>>>>>> like to
>>>>>> know if it's possible, within R, to fill the page served
>>>>>> from:
>>>>>>
>>>>>>
>>>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx
>>>>>>
>>>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>
>>>>>>
>>>>>> select the "corporate" option and fill with AT&T the field
>>>>>> for
>>>>>> "Issuer
>>>>>> name", ask the page to display the results, and retrieve
>>>>>> the
>>>>>> results
>>>>>> for each of the bonds issued by AT&T (for example:
>>>>>>
>>>>>>
>>>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/BondDetail.aspx?ID=__MDAxOTU3Qko3
>>>>>>
>>>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/BondDetail.aspx?ID=MDAxOTU3Qko3>)
>>>>>>
>>>>>> and parsing the data from the web page.
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>> ________________________________________________
>>>>>> R-devel_at_r-project.org <mailto:R-devel_at_r-project.org>
>>>>>> mailing
>>>>>> list
>>>>>> https://stat.ethz.ch/mailman/__listinfo/r-devel
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>>>>
>>>>>>
>>>>>> ________________________________________________
>>>>>> R-devel_at_r-project.org <mailto:R-devel_at_r-project.org> mailing
>>>>>> list
>>>>>> https://stat.ethz.ch/mailman/__listinfo/r-devel
>>>>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gabriel Becker
>>>>>> Graduate Student
>>>>>> Statistics Department
>>>>>> University of California, Davis
>>>>>>
>>>>>
>>>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 01 Nov 2012 - 13:13:10 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 01 Nov 2012 - 13:20:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive