Re: [R] extract fixed width fields from a string

From: jim holtman <jholtman_at_gmail.com>
Date: Fri, 20 Jan 2012 15:55:39 -0500

Here part of it. This is the conversion of base 36 to numeric that is case insensitive. This makes use of mapping the alphabetics to characters that start just after '9' and then doing the conversion. You can extend it to base 64 using the same approach.

> base36ToInteger <- function (Str)

+ {
+     common <- chartr(
+         "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"  # input
+       , ":;<=>?@ABCDEFGHIJKLMNOPQRS:;<=>?@ABCDEFGHIJKLMNOPQRS"  #
'magic' translation
+       , Str
+       )
+     x <- as.numeric(charToRaw(common)) - 48
+     sum(x * 36 ^ rev(seq(length(x)) - 1))
+ }
> base36ToInteger('1')

[1] 1
> base36ToInteger('12')

[1] 38
> base36ToInteger('123')

[1] 1371
> base36ToInteger('1234')

[1] 49360
> base36ToInteger('12345')

[1] 1776965
> base36ToInteger('123456')

[1] 63970746
>

On Fri, Jan 20, 2012 at 3:25 PM, Sam Steingold <sds_at_gnu.org> wrote:
> On Fri, Jan 20, 2012 at 14:05, Sarah Goslee <sarah.goslee@gmail.com> wrote:

>> Reproducible example, please. This doesn't make a whole lot of sense
>> otherwise.
>

> here is the string:
> "1288915200|00000704000000905a00000A118"
>

> I want the following data extracted from it:
> 1. the decimal number before "|": 1288915200
> 2. the string after "|" split into 3 parts, each of length 9 bytes,
> and then split into 3 more parts:
> id: the first 6 bytes, int, base 36;
> count: the next 2 bytes, int, base 10;
> offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_)
> i.e., the above line is:
> id=7, count=4, days=0
> id=9; count=5; offset=10
> id=10; count=11; offset=8
>

> thanks.
>
>> On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold <sds_at_gnu.org> wrote:
>>> Hi,
>>> I have a data frame with one column containing string of the form "ABC...|XYZ..."
>>> where ABC etc are fields of 6 alphanumeric characters each
>>> and XYZ etc are fields of 8 alphanumeric characters each;
>>> "|" is a mandatory separator;
>>> I do not know in advance how many fields of each kind will each row contain.
>>> I need to extract these fields from the string.
>>
>> This is already a data frame, so you don't need to import it into R,
>> just process it?
>

> yes.
>
>> I don't know. Save them as a list, most likely.
>

> can a column contain lists?
>
>>> First thing I want to do is to have a count table of them.
>>> Then I thought of adding an extra column for each field value and
>>> putting 0/1 there, e.g., frame
>>> 1,AB
>>> 2,BCD
>>
>> I thought we had integers at this point?
>

> yes, A..D are placeholders for integers
>
>>> What do people do?
>>> Can I have a columns of "sets" in data frame?
>>> Does R support the "set" data type?
>>
>> factor() seems to be what you're looking for.
>

> no, a column of factors will contain a single factor item in each row.
> e.g.:
> 1 A
> 2 B
> 3 A
> 4 C
> I want each row to contain a set of factor items:
> 1 A&B
> 2 A
> 3 C
> 4 <void>
>
>

> --
> Sam Steingold <http://sds.podval.org>
>

> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 20 Jan 2012 - 20:58:12 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 20 Jan 2012 - 21:20:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive