[Rd] A question about the API mkchar()

From: Fán Lóng <foylong_at_gmail.com>
Date: Tue, 28 Oct 2008 18:26:33 +0800


Hi guys,

I've got a question about the API mkchar(). I have met some difficulty in parsing utf-8 string to mkchar() in R-2.7.0.

I was intending to parse an utf-8 string str_jan (some Japanese characters such as$B$U(B, whose utf-8 code is E381B5) to R API SEXP mkChar(const char *name) , we only need to create the SEXP using the string that we parsed.

Unfortunately, I found when parsing the variable str_jan, R will automatically convert the str_jan according to the current locale setting, so only in the English locale could the function work correctly, under other locale, such as Japanese or Chinese, the string will be convert incorrectly. As a matter of fact, those utf-8 code already is Unicode string, and don't need to be converted at all.

I also tried to use the SEXP Rf_mkCharCE(const char *, cetype_t);, Parsing the CE_UTF8 as the argument of cetype_t, but the result is worse. It returned the result as ucs code, an kind of Unicode under windows platform.

All I want to get is just a SEXP object containing the original utf-8 string, no matter what locale is set currently. Normally what can I do?

Thanks,

Long



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 28 Oct 2008 - 10:30:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 28 Oct 2008 - 16:30:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive