Re: [Rd] Post CGI forms with built-in R function?

From: Mike Schaffer <mschaff_at_bu.edu>
Date: Thu 20 Jul 2006 - 15:01:17 GMT

Thanks Duncan. I figured there was some limit.

Your suggestion to check out the httpRequest code has me headed in the right direction, but I am having problems with the data returned from the socketConnection. Some of the returned data appears to improperly decoded. I don't know if I've stumbled on a low-level socket bug, or I need to error check the results myself. Can anyone figure out the problem in the code below?

This was run using R version 2.3.1 (2006-06-01) on both Linux and OS X with the same results.

# First, the correctly formatted data for comparison is returned by the code below. As we've established, this works fine, but I can't use it with long URIs:

full.url<-"http://genomics11.bu.edu/cgi-bin/Tractor_dev/external/ get_msa.cgi?
user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_000064, NM_000066&orgs=Hs,mm8,canFam2"
read<-readLines(full.url)

# This returns three tab-delimited lines with a header row.

 > read
[1] "NA\tHs\tmm8\tcanFam2"

[2] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG---
CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG---

CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ 
CTGTCACACACCTAGGGAGATGCTC------------------ 
CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA----
AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- 
ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- AAGAGGTCCCAGCGT----GAGTGTCGCT---
TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- 
TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC-
CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- 
AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- 
GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- 
ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- 
AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- 
TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- 
CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- 
CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT
\tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- 
CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- 
GTGGCTGGCCTCGGC------------------------ 
TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG
GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- 
GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT"
[3] "NM_000064\tC-------------------------------------
CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------
TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- 
TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA \tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA----
CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- 
CCTTGACCC---CAAGAAAAGCCTGAGCCAAG--
CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT-
CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- 
GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG 
\t---------------------------------------------------------------------- 
--------------------------------------------------------- 
CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- 
CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- GGCAATCCCCAACAGCCTCTGCTCA"
[4] "NM_000066

\tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- --AAACTG-------------
CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA-----
CAGAGGAATAAACACCATGCTGCTACA- 
GATTTTTCATTAT----------------------------------- 
TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- 
TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT 
\t---------------------------------------------------------------------- 
------------------------------------------------------------------------ 
-------------------------------------------------------------- 
GGCTGCTT-CCATGGAATCA--------------------------------- 
CAGTTCTCACTGT-----------------------------------CCTAG--- 
GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- CTGAGAGC
\t---------------------------------------------------------------------- 
------------------------------------------- 
GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC--------------
CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT----- 
CTTGTGTAGAA------------------- 
CTTTCCATTGT----------------------------------- 
TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT-T-TCTGTC- TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG GACAGAAGCTGACAGC" # Now, if I try to use sockets with a POST request, I get some of the correct data, but some of it is incorrect or improperly decoded:

host<-"genomics11.bu.edu"
path<-"/cgi-bin/Tractor_dev/external/get_msa.cgi" dat<-"user_id=0&table=seqs_ucsc_hg18&len=350&gene_set_ids=NM_000029,NM_0 00064,NM_000066&orgs=Hs,mm8,canFam2"

len <- length( strsplit(dat,"")[[1]])
request<-paste("POST ",path," HTTP/1.1\nHost: ",host,"\nReferer: \nContent-type: application/x-www-form-urlencoded\nContent-length: ",len,"\nConnection: Keep-Alive\n\n",dat,sep="") fp <- socketConnection(host=host,port=80,server=FALSE,blocking=TRUE) write(request,fp)
socketSelect(list(fp)) # Wait until results are ready sock<-readLines(fp)
close(fp)

# Returns:

 > sock
[1] "HTTP/1.1 200 OK"

[2] "Date: Thu, 20 Jul 2006 14:27:37 GMT"

[3] "Server: Apache/2.0.53 (Fedora)"

[4] "Connection: close"

[5] "Transfer-Encoding: chunked"

[6] "Content-Type: text/plain; charset=ISO-8859-1"

[7] ""

[8] "fd0"

[9] "NA\tHs\tmm8\tcanFam2"

[10] "NM_000029\tTAAGCA--AGACTC-TCCCCTGCCCTCTGCCCTCTGCACCTCCGG---
CCTGCATGTC----------CCTGTGGCCTCTTGGGGGTACATCTCCCGGGG---

CTGGGTCAGAAG---------GCCTGGGTGGTTGGCCTCAGG------------------------ 
CTGTCACACACCTAGGGAGATGCTC------------------ 
CCGTTTCTGGGAACCTTGGCCCCGACTCCTGCA----
AACTTCGGTAAATGTGTAACTCGACCCTGCACCGGCTC---------------- 
ACTCTGTTCAGCA----GTGAAACTCTGCATCGATCACTAAGACTTCCTGG- AAGAGGTCCCAGCGT----GAGTGTCGCT---
TCTGGCATCTGTCCTTCTGG---------------------CCAGCCTGTGGTC-------------- 
TGG-CCAAGTGATGTAACCCTCCTCT---CCAGCCT\tTGCAAGTGAGCCCC-
CTTCCTG-----------------------------GCATGCC----------CAGAGAGGCTTACG-- 
AGTGCATCACGAGGGGG-CTTTCATCCCAAG--------- 
GTCTGCATGGCTGGCTTCAGG------------------------TTGTCACAACCC----- 
ACTCAATC------------------CTGTGACTG-------TGGTCCTGGCTCCAGGG---- 
AACTGGGGTAAATGTGTAACCCAAGGCCAGCC---------------------- 
TATTTTTGCATGA----GGCT-------CATCTGCCAGTAGGGCTTCCTGG-AAGGGG- 
CCCAGAG-----GAACATCAC----CCTGGCCCTGATCCATCTTGGT------------------- 
CAAGCCTGGATTCTCA-----------TGG-TTCCCTGATCTGGGTCCTCCC----CCAGCCT
\tCCGG----GGCTCC-TTCCCTG--------------CGCCCTGGGGCCTCAGCACATT---------- 
CTTGGGGACTCTCAGAAGCACACCTCGAGAGG--GCTCTGTCAGAAG---------GCTTG- 
GTGGCTGGCCTCGGC------------------------ 
TTGTCACAGCTCAGGGCAGAGACGCGACACACACACCTACACACAGGTACGGGGCGCTCCGGACCCGGCCCG
GGCAGGGGAGCTGCGGTCAATGTGTAACTCGGCGGCCCAGCGGCTC---------------- 
GTTCTGCTCAGCA----CAGAAAGTGTGCATCGATCTCCCTGACTTCCTGG- AAGGCGTCCCAGCCT----GAGAGTAGCT----CTGGCGCCTGTACCCCCCACC------ CCCGTGGGGCCCCCACCCCCATGGTC--------------GGG- CCAAGTGATGTCACCTCCCGCCTCCCCAGCCT"
[11] "NM_000064\tC-------------------------------------
CAAAAGTGAACTGGGG-ATGAG-GTCCAAGACATCTGCGGTGGGGGGTT- CTCCAGACCTTAGTGTTCTTC--CACTACAAAGTGGGTCCAACAGAGAAAGG------------
TCTGTG----------------TTCACCAGGTGG---CCCTGACCC--- 
TGGGAGAGTCCAGGGCAGGGTGCAGCTGCATTCATGCTGCTGGG----GAACATGC- CCTCAGGTTACTCACCCCATGGA----CATGTTGGCC-CCAGGGACTGAAAA-GCTTAG---- GAAATGGTATTGAGAAATCTGGGGCAGC-CCCAAAAGGG-GAGAGG--CCATGGGGAGAAGGGG-- GGGCTGAG----TGGGGGAAAGGCAGGAGCCAG--ATAAAA----AGCCAGCTCCAGCAGGCGCTGCTCA \tATTTAGCAAGACCTTGGGGGTAGGGAGAACCAGCCATCCAGAAGTG--CTGGGTTACTGG- GACCCAGCTAAGTGTGGGAGGAGGTCACTCTAGACTTCAATGGTCTCTGGTGTAACCAAGTA----
CAACAGGGACCAG------------CCCAGG----------------TTCAGCATCTGG--- 
CCTTGACCC---CAAGAAAAGCCTGAGCCAAG--
CAGGTACTTTCAAGCTCCAGGGTAATGGAAATGTGCCTAGGGTTACTCACCCCA-AGG---- CTTGTTGCCC-CAGGTTTGTGAAAAAGCTTAG----GAAACTATGTTGCGAAATTTTGGGCAGT-
CCCTGGTG--------------CAGGAACAGGGAG--GGACCAGA------GAGGA------- 
GAGCCAT--ATAAAG----AGCCAGCGGCTACAGCCCCAGCTCG 
\t---------------------------------------------------------------------- 
--------------------------------------------------------- 
CCACGGGGAAAGG------------T----------------------TCACCAGCTGG--- 
CCTTGACCC---TGAGGGAGGCCATGGCAAGGGGAAGGTGTGTTCATGTTGCAGGA----GGACATGC- CCTTGGGTTAGTTACCCCC--GA----CACACTGGCC-CCGGGGATTGAAAA-ACTTAG---- GAAATGGTATTGAGTAATCTGGGGCAGC-TGCAGGGAGG-GGGAGG-- CTACAGGAGCTGTGGGCTGGGCTGAA---GGTGGGGGGAGGCTGGGGCCAG--ATAAAA---- GGCAATCCCCAACAGCCTCTGCTCA"
[12] "NM_000066

\tAGCTGTTAGGTTGGTGCAAAAGTAATTGTGGTTTTTGCCATTAAAAGCAATGACAA-------------- --AAACTG-------------
CAATTACTTTTGCACCAACCTAGTCAGTGGCAGAGAATGTACTTGAACCCAGGCTGTCTAGACCTAGATCCC ACAGTCCTTGCCACCTCA--CTAATAGCCTGTCCAC---TTGGCAGCTTACCCTAAAGTTA-----
CAGAGGAATAAACACCATGCTGCTACA- 
GATTTTTCATTAT----------------------------------- 
TCTGGTTGGTTTCCAGAGTGACAGG---TAAGTTT-TTGGTC-TGTGCAAAGTCTG----- 
TTTCCAGTCACTAGTGGCTTTCTGTTTACTTTGCAGAGCTATTTGCTCT-TGGGGACAGAAGCTGACAGT 
\t---------------------------------------------------------------------- 
------------------------------------------------------------------------ 
-------------------------------------------------------------- 
GGCTGCTT-CCATGGAATCA--------------------------------- 
CAGTTCTCACTGT-----------------------------------CCTAG--- 
GTGTGCGGTATCACAAG---TGAGTACATTCGTGCTGTGCAAAGCTGA---- GGTTCCAGGTACAAGCG----CCTGTTTGCTTTGCTGACCTGTTTGCTCTATGTCAACAGAA- CTGAGAGC
\t---------------------------------------------------------------------- 
------------------------------------------- 
GCAAGTGGCAGAGAAAGGATTTGAACCCAGGCAGTCTGGACCTCGAGCC-------------- CCTCATCCTAACAGCTTGTCCTT---TTGGCTGCCT--------CTTT-----
CTTGTGTAGAA-------------------CTTTCCATTGT------"

[13] "a1"

[14] "-----------------------------TCTCCCTGGTGTCCAGAGTCATAGG---TAAGT-
T-TCTGTC-
TGTACAAAGTCTAAGGGGTTTCCGGTCACTGTTGATTTTTTGTTTACTTTGCTGACCTGTTTGCTCTATGGG GACAGAAGCTGACAGC"
[15] ""

[16] "0"

[17] ""

Element 13 appears to be improperly decoded hexadecimal data. Can anyone shed some light on why this is? Are the strings too long for the readLines command to properly read from the socket? Thanks for any additional help anyone can provide.

--
Mike




On Jul 20, 2006, at 9:49 AM, Duncan Temple Lang wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> There is a hard coded limit of 4096 characters in
> RxmlNanoHTTPScanURL and other ScanURL routines in nanohttp.c
> and nanoftp.c.   And your URI is 5138 and so walks past the bounds
> of the array of length 4096.
>
> I am not yet convinced that it is worthwhile to increase this limit to
> a larger number.  Using POST in this context really is a better
> solution.  But we do need to add checks to the code to ensure that
> the URI string is smaller than 4096.
>
> I'll try to get an opportunity to do that tomorrow before I take off.
>
>   D.
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> - --
> Duncan Temple Lang                    duncan@wald.ucdavis.edu
> Department of Statistics              work:  (530) 752-4782
> 4210 Mathematical Sciences Building   fax:   (530) 752-7099
> One Shields Ave.
> University of California at Davis
> Davis,
> CA 95616,
> USA
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (Darwin)
>
> iD8DBQFEv4nO9p/Jzwa2QP4RAnPnAJ974RMxo/KXfxQjaRHoHB1ZsdIy+QCeNhXg
> EDk/WHaFUeH5C2v/607kovo=
> =FAGn
> -----END PGP SIGNATURE-----

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Jul 21 01:05:58 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 20 Jul 2006 - 18:28:59 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.