From: Tony Plate <tplate_at_acm.org>
Date: Tue 16 Aug 2005 - 08:40:05 EST

Here's one way of working with the data you gave:

```   HEADER1 HEADER2 HEADER3           HEADER3.1
1      A1      B1      C1         X11;X12;X13
2      A2      B2      C2 X21;X22;X23;X24;X25
3      A3      B3      C3
4      A4      B4      C4         X41;X42;X43
5      A5      B5      C5                 X51
```
> apply(x, 1, function(x) strsplit(x[4], ";")[[1]]) \$"1"
[1] "X11" "X12" "X13"

\$"2"
[1] "X21" "X22" "X23" "X24" "X25"

\$"3"
character(0)

\$"4"
[1] "X41" "X42" "X43"

\$"5"
[1] "X51"

```+    y <- strsplit(x[4], ";")[[1]]
+    x3 <- matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T)
+    return(cbind(x3, if (length(y)) y else "NA"))
+ }))
[,1] [,2] [,3] [,4]

[1,] "A1" "B1" "C1" "X11"
[2,] "A1" "B1" "C1" "X12"
[3,] "A1" "B1" "C1" "X13"
[4,] "A2" "B2" "C2" "X21"
[5,] "A2" "B2" "C2" "X22"
[6,] "A2" "B2" "C2" "X23"
[7,] "A2" "B2" "C2" "X24"
```
[8,] "A2" "B2" "C2" "X25"
[9,] "A3" "B3" "C3" "NA"
```[10,] "A4" "B4" "C4" "X41"
[11,] "A4" "B4" "C4" "X42"
[12,] "A4" "B4" "C4" "X43"
[13,] "A5" "B5" "C5" "X51"
```

>

This of course is a matrix; you can convert it back to a dataframe using as.data.frame() if you desire. Use either "NA" (with quotes) or NA (without quotes) to control whether you get just the string "NA" or an actual character NA value in column 4. If you're processing a huge amount of data, you can probably do better by rewriting the above code to avoid implicit coercions of data types.

hope this helps,

Tony Plate

S.O. Nyangoma wrote:

```> I have a dataset that is basically structureless. Its dimension varies
> from row to row and sep(s) are a mixture of tab and semi colon (;) and
> example is
>
> A1       B1      C1       X11;X12;X13
> A2       B2      C2       X21;X22;X23;X24;X25
> A3       B3      C3
> A4       B4      C4       X41;X42;X43
> A5       B5      C5       X51
>
> etc., say. Note that a blank under HEADER3 corresponds to non
> occurance and all semi colon (;) delimited variables are under
> HEADER3. These values run into tens of thousands. I want to give some
> order to this queer matrix to something like:
>
> A1       B1      C1       X11
> A1       B1      C1       X12
> A1       B1      C1       X13
> A1       B1      C1       X14
> A2       B2      C2       X21
> A2       B2      C2       X22
> A2       B2      C2       X23
> A2       B2      C2       X24
> A2       B2      C2       X25
> A2       B2      C2       X26
> A3       B3      C3       NA
> A4       B4      C4       X41
> A4       B4      C4       X42
> A4       B4      C4       X43
>
> Is there a brilliant R-way of doing such task?
>
> Goodday. Stephen.
>
>
>
>
>
>
>
>
