Re: [R] scan html: sep = "<td>"

From: Uwe Ligges <ligges_at_statistik.uni-dortmund.de>
Date: Tue 05 Apr 2005 - 01:30:01 EST

Christoph Lehmann wrote:

> entry from html:
>
> <tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td>
> 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
> <tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td>
> 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>
>
>
>
> using
> left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
> sep=c("<td>", "</td>"))
>
>
> yields
>
> > left.data
> [1] " " "tr bgcolor=#9090f0>" "td align=right>"
> [4] "b>BM" "/b>" "/td>"
> [7] "td> 0.952" "/td>" "td> 0.136"
> [10] "/td>" "td> 6.984" "/td>"
> [13] "td>0.000000" "/td>" "/tr>"
> [16] " " "tr bgcolor=#9090f0>" "td align=right>"
> [19] "b>BH" "/b>" "/td>"
> [22] "td> 1.338" "/td>" "td> 0.136"
> [25] "/td>" "td> 9.821" "/td>"
> [28] "td>0.000000" "/td>" "/tr>"
>
> why doesn't it detect the whole '<tr> as sep?
>
>
> Uwe Ligges wrote:
>

>> Christoph Lehmann wrote:
>>
>>> Hi
>>> I try to import html text and I need to split the fields at each <td> 
>>> or </td> entry
>>>
>>> How can I succeed? sep = '<td>' doens't yield the right result
>>
>>
>> If it fits pairwise together, use
>>   sep=c("<td>", "</td>")

Apologies, one should not send untested code. "sep" must be a character rather than a string containg more than one character.

So you may want to try out my second suggestion.

Uwe Ligges

>> if not, you can read the whole lot with readLines and strsplit for 
>> both pattern after that, for example.
>>
>> Uwe Ligges
>>
>>
>>
>>> thanks for hints
>>>
>>> ______________________________________________
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide! 
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Apr 05 01:51:54 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:01 EST