[Rd] more powerful iconv

From: Matt Shotwell <shotwelm_at_musc.edu>
Date: Sat, 19 Jun 2010 16:53:00 -0400


R community,

As you may know, R's iconv doesn't work well converting to and from encodings that allow embedded nulls. For example

> iconv("foo", to="UTF-16")

Error in iconv("foo", to = "UTF-16") :
  embedded nul in string: '\xff\xfef\0o\0o\0'

However, I don't believe embedded nulls are at issue here, but rather that R's iconv doesn't accept objects of type RAWSXP. The iconv mechanism, after all, operates on encoded binary data, and not necessarily null terminated C strings. I'd like to submit a very small patch (12 lines w/o documentation) that allows R's iconv to operate on raw objects, while not interfering or affecting the behavior of iconv on character vectors. To keep this message terse, I've put additional discussion, description of what the patch does, and examples here: http://biostatmatt.com/archives/456

Also, here is a link to the patch file:
http://biostatmatt.com/R/R-devel-iconv-0.0.patch

If this change is adopted, I'd be happy to submit a documentation patch also.

-Matt

Index: src/library/base/R/New-Internal.R


 iconv <- function(x, from = "", to = "", sub = NA, mark = TRUE)  {
- if(!is.character(x)) x <- as.character(x) + if(!is.character(x) && !is.raw(x)) x <- as.character(x)

     .Internal(iconv(x, from, to, as.character(sub), mark))  }  

Index: src/main/sysutils.c


                 nout = cbuff.bufsize - 1 - outb;
@@ -632,7 +633,12 @@

 		}
 		SET_STRING_ELT(ans, i, mkCharLenCE(cbuff.data, nout, ienc));
 	    }
-	    else SET_STRING_ELT(ans, i, NA_STRING);
+	    else if(!isRawx) SET_STRING_ELT(ans, i, NA_STRING);
+	    else {
+		nout = cbuff.bufsize - 1 - outb;
+		ans = allocVector(RAWSXP, nout);
+		memcpy(RAW(ans), cbuff.data, nout);
+	    }
 	}
 	Riconv_close(obj);
 	R_FreeStringBuffer(&cbuff);

-- 
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
http://biostatmatt.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sat 19 Jun 2010 - 20:57:35 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Jun 2010 - 03:51:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive