Re: [Rd] Retrieving data from aspx pages

From: jose ramon mazaira <jramaza_at_gmail.com>
Date: Wed, 31 Oct 2012 17:14:54 +0100

I'd like to make you note that I've discovered that package RCurl already provides a utility that allows interaction via POST requests with servers. In fact, the FAQ for RCurl contains specifically an example with an aspx page:

x = postForm("http://www.fas.usda.gov/psdonline/psdResult.aspx",

             style = "post",
            .params = list(visited="1",
                          lstGroup = "all",
                          lstCommodity="2631000",
                          lstAttribute="88",
                          lstCountry="**",
                          lstDate="2011",
                          lstColumn="Year",
                          lstOrder="Commodity%2FAttribute%2FCountry"))

Check this link: http://www.omegahat.org/RCurl However, I think that it would be more useful to automate the interaction with servers retrieving automatically the name-value pairs required by the server (parsing the page source code) instead of examining in each web page the appropiate fields.

2012/10/30, Paul Gilbert <pgilbert902_at_gmail.com>:
> Jose
>
> As far as getting to the data, I think the best way to do this sort of
> thing would be if the site supports a SOAP or REST interface. When they
> don't (yet) then one is faced with clicking through some pages. Python
> or Java is one way to automate the process of clicking through the
> pages. I don't know how to do that in R, but would like to know if it is
> possible.
>
> But, I guess I was confused about the part you want to improve. What I
> have works fairly smoothly parsing and passing back JSON data, converted
> from a csv file, into R. The downside is that this approach requires
> more than R to be installed on the client machine. But if the object you
> get back is ASPX, then you either need to parse it directly, or convert
> it to JSON, or something else you can deal with. I suspect that will be
> fairly specific to a particular web site, but I don't really know enough
> about ASPX to be sure.
>
> Paul
>
> On 12-10-30 01:12 PM, jose ramon mazaira wrote:
>> Thanks for your interest, Paul.
>> I've checked the source code of TSjson and I've seen that what it does
>> is to call a Python script to retrieve the data. In fact, I've already
>> done this with Java using the URLConnection class and sending the
>> requested values to fill the form.
>> However, I think it would be more useful to open a connection with R
>> and to send the requested values within R, and not through an external
>> program.
>> The application I've designed, like yours, is also page-specific
>> (i.e., designed for
>> http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx),
>> but I think that our applications would be more powerful if they were
>> able to parse the name-value pairs generated from ASPX (or of any
>> other dynamically generated web page) and ask the user to select the
>> appropiate values.
>>
>> 2012/10/30, Paul Gilbert <pgilbert902_at_gmail.com>:
>>> I think RHTMLForms works if you have a single form, but I have not been
>>> able to see how to use it when you need to go through a sequence of
>>> dynamically generated forms (like you can do with Python mechanize).
>>>
>>> Paul
>>>
>>> On 12-10-30 09:08 AM, Gabriel Becker wrote:
>>>> I haven't used it extensively myself, and can't speak to it's current
>>>> state but on quick inspection RHTMLForms seems worth a look for what
>>>> you
>>>> want.
>>>>
>>>> http://www.omegahat.org/RHTMLForms/
>>>>
>>>> ~G
>>>>
>>>> On Tue, Oct 30, 2012 t 5:38 AM, Paul Gilbert <pgilbert902_at_gmail.com
>>>> <mailto:pgilbert902_at_gmail.com>> wrote:
>>>>
>>>> I don't know of an easy way to do this in R. I've been doing
>>>> something similar with python scripts called from R. If anyone
>>>> knows
>>>> how to do this with just R, I would appreciate hearing too.
>>>>
>>>> Paul
>>>>
>>>>
>>>> On 12-10-29 04:11 PM, jose ramon mazaira wrote:
>>>>
>>>> Hi. I'm trying to write an application to retrieve financial
>>>> data
>>>> (specially bonds data) from FINRA. The web page is served
>>>> dynamically
>>>> from an asp.net <http://asp.net> application:
>>>>
>>>>
>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx
>>>>
>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>
>>>>
>>>> I'd like to know if it's possible to fill dynamically the web
>>>> page
>>>> form from R and, after filling it (with the issuer name),
>>>> retrieve the
>>>> web page, parse the data, and covert it to appropiate R
>>>> objects.
>>>> For example, suppose I want to search data for AT&T bonds. I'd
>>>> like to
>>>> know if it's possible, within R, to fill the page served from:
>>>>
>>>>
>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/AdvancedScreener.__aspx
>>>>
>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/AdvancedScreener.aspx>
>>>>
>>>> select the "corporate" option and fill with AT&T the field for
>>>> "Issuer
>>>> name", ask the page to display the results, and retrieve the
>>>> results
>>>> for each of the bonds issued by AT&T (for example:
>>>>
>>>>
>>>> http://cxa.gtm.__idmanagedsolutions.com/finra/__BondCenter/BondDetail.aspx?ID=__MDAxOTU3Qko3
>>>>
>>>> <http://cxa.gtm.idmanagedsolutions.com/finra/BondCenter/BondDetail.aspx?ID=MDAxOTU3Qko3>)
>>>>
>>>> and parsing the data from the web page.
>>>>
>>>> Thanks in advance.
>>>>
>>>> ________________________________________________
>>>> R-devel_at_r-project.org <mailto:R-devel_at_r-project.org> mailing
>>>> list
>>>> https://stat.ethz.ch/mailman/__listinfo/r-devel
>>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>>
>>>>
>>>> ________________________________________________
>>>> R-devel_at_r-project.org <mailto:R-devel_at_r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/__listinfo/r-devel
>>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Gabriel Becker
>>>> Graduate Student
>>>> Statistics Department
>>>> University of California, Davis
>>>>
>>>
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 01 Nov 2012 - 13:15:53 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 01 Nov 2012 - 19:10:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive