Re: [R] NEW: Sociolects in R

From: Roland Rau <>
Date: Tue, 01 Apr 2008 11:33:40 -0400

Dear Peter,

congratulations. Looks very impressive. Seems like you guys in Denmark are very productive this time of the year. This brings me to my actual problem: isn't Lars Polifo a close relative of Rolf Poalis? Has there been any recent progress with the 'sas2r' parser?


Peter Dalgaard wrote:
> The R translation teams have done a great job in making R usable for
> people who do not have English as their mother tongue. However, even
> within English speaking countries, there are groups which have trouble
> with the language, and it may be valuable to support the Sociolects of
> these groups too.
> Thanks to a generous contribution from Lars Polifo, these features will
> be made available in an upcoming version of R.
> As it turns out, there are some particularly interesting challenges that
> needs to be addressed. Consider for instance the translation of the t
> test in the locale en_SF_US.UTF8 (notice the interjection of the code
> "SF" to denote "San Fernando Valley")
> t.test(extra ~ group, oh, baby, data = sleep)
> Welch Two Sample t-test
> data: extra by group
> t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794
> alternative hypothesis: true difference in means is like, ya know, not equal to 0
> 95 percent confidence interval:
> -3.3654832 0.2054832
> sample estimates:
> mean in group 1 mean in group 2
> 0.75 2.33
> Notice that in addition to the simple message string modifications, it
> has been necessary to modify the parser so as to delete obviously
> superfluous arguments such as "oh" or "baby" (a particular issue here is
> that the argument "like" might actually be intended to mean likelihood).
> Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate
> spellings of arguments like "data":
> t.test(ixtra ~ gruoop, deta = sleep)
> Velch Tvu Semple-a t-test
> deta: ixtra by gruoop
> t = -1.8608, dff = 17.776, p-felooe-a = 0.0794
> elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel tu 0
> 95 percent cunffeedence-a interfel:
> -3.3654832 0.2054832
> semple-a isteemetes:
> meun in gruoop 1 meun in gruoop 2
> 0.75 2.33
> Canadian English poses particular problems, which have not yet been
> resolved. If we are to do it properly, it would entail modifications to
> the R language itself. For instance we'd have to introduce a "four" loop
> and change the end-brace to the four-character string "eh?}".
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Tue 01 Apr 2008 - 15:36:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Apr 2008 - 16:30:25 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive