Re: [R] Converting english words to numeric equivalents

From: Hans-Joerg Bibiko <bibiko_at_eva.mpg.de>
Date: Mon, 28 Jul 2008 12:37:36 +0200

On 28 Jul 2008, at 12:23, Hans-Joerg Bibiko wrote:
> How about this?
>
> unletter <- function(word) {
> gsub('-64',' ',paste(sprintf("%02d",utf8ToInt(tolower(word)) -
> 96),collapse=''))
> }
>
> unletter("abc")
> [1] "010203"
>
> unletter("Aw")
> [1] "0123"
>
> unletter("I walk to school")
> [1] "09 23011211 2015 190308151512"

I do not know precisely what do you want to do.

With:
as.double(unlist(strsplit(unletter("I walk to school")," ")))

you will get a numeric vector out of the string. But this leads to a problem with large words like:

as.double(unlist(strsplit(unletter("schoolschool")," "))) [1] 1.903082e+23

Thus I would suggest if there's a need to mirror words as numeric values and the numeric values haven't a meaning to parse your text in beforehand to build a hash (a list) of all distinct words in your text and assign a number to each word.
This would end up in a list la:
words <- ("abc" = 1, "I" = 2, "go" = 3, etc.)

After that you can access these numeric values via: words['go']
$go
[1] 3

--Hans



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 28 Jul 2008 - 10:40:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Jul 2008 - 12:32:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive