Re: [Rd] problem gsub in the locale of CP932 and SJIS (PR#9751)

From: <nakama_at_ki.rim.or.jp>
Date: Mon, 25 Jun 2007 12:08:31 +0200 (CEST)


Thanks.

As for mbs_init, the outside of the loop is desirable.

probrem code is.
> gsub("A","=A5u30bd=A5u8868","A")

euc-jp and utf-8 moves without a problem.

> Sys.getlocale("LC_CTYPE") # SHIFT_JIS system.
[1] "ja_JP.SJIS"
> charToRaw("=A5u30bd=A5u8868") # The second byte is a char of 5c
[1] 83 5c 95 5c

2007/6/25, Prof Brian Ripley <ripley_at_stats.ox.ac.uk>:
> Thanks for this.
>
> I don't think the patch is quite right. As I understand it, mbstate_t
> should be initialized at the start of the string, not before each
> character, and that is what is done in the rest of R.
>
> Also, do you have an example I can use to test the patch, please?
>
> R 2.5.0 is now in code freeze and I don't think this is vital for that.
>
>
> On Sun, 24 Jun 2007, nakama_at_ki.rim.or.jp wrote:
>
> > Full_Name: Ei-ji Nakama
> > Version: R-2.5.0
> > OS: any
> > Submission from: (NULL) (219.117.236.5)
> >
> >
> > problem by operation of gsub in the locale of CP932 and SJIS.
> > The inconvenient character code which used 0x5c after the first byte.
> >
> > --- R-2.5.0.orig/src/main/character.c 2007-04-03 11:05:05.000000000 +=
0900
> > +++ R-2.5.0/src/main/character.c 2007-06-24 22:31:06.000000000 +=
0900
> > @@ -986,6 +986,17 @@
> > char *p =3D repl;
> > n =3D strlen(repl) - (regmatch[0].rm_eo - regmatch[0].rm_so);
> > while (*p) {
> > +#ifdef SUPPORT_MBCS
> > + if(mbcslocale){
> > + int clen;
> > + mbstate_t mb_st;
> > + mbs_init(&mb_st);
> > + if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > + p+=3Dclen;
> > + continue;
> > + }
> > + }
> > +#endif
> > if (*p =3D=3D '\\') {
> > if ('1' <=3D p[1] && p[1] <=3D '9') {
> > k =3D p[1] - '0';
> > @@ -1014,6 +1025,18 @@
> > int i, k;
> > char *p =3D repl, *t =3D target;
> > while (*p) {
> > +#ifdef SUPPORT_MBCS
> > + if(mbcslocale){
> > + int clen;
> > + mbstate_t mb_st;
> > + mbs_init(&mb_st);
> > + if((clen =3D Mbrtowc(NULL, p, MB_CUR_MAX, &mb_st)) > 1){
> > + for ( i=3D0; i<clen; i++)
> > + *t++ =3D *p++;
> > + continue;
> > + }
> > + }
> > +#endif
> > if (*p =3D=3D '\\') {
> > if ('1' <=3D p[1] && p[1] <=3D '9') {
> > k =3D p[1] - '0';
> >
> > ______________________________________________
> > R-devel_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
>

--=20
EI-JI Nakama <nakama_at_ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb" <nakama_at_ki.rim.or.jp>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 25 Jun 2007 - 10:15:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Jun 2007 - 11:35:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.