Re: [R] queer data set

From: Tony Plate <tplate_at_acm.org>
Date: Tue 16 Aug 2005 - 08:40:05 EST

Here's one way of working with the data you gave:

 > x <- read.table(file("clipboard"), fill=T, header=T)  > x

   HEADER1 HEADER2 HEADER3           HEADER3.1
1      A1      B1      C1         X11;X12;X13
2      A2      B2      C2 X21;X22;X23;X24;X25
3      A3      B3      C3
4      A4      B4      C4         X41;X42;X43
5      A5      B5      C5                 X51
 > apply(x, 1, function(x) strsplit(x[4], ";")[[1]]) $"1"
[1] "X11" "X12" "X13"

$"2"
[1] "X21" "X22" "X23" "X24" "X25"

$"3"
character(0)

$"4"
[1] "X41" "X42" "X43"

$"5"
[1] "X51"

 > do.call("rbind", apply(x, 1, function(x) {

+    y <- strsplit(x[4], ";")[[1]]
+    x3 <- matrix(x[1:3], ncol=3, nrow=max(1,length(y)), byrow=T)
+    return(cbind(x3, if (length(y)) y else "NA"))
+ }))
       [,1] [,2] [,3] [,4]

[1,] "A1" "B1" "C1" "X11"
[2,] "A1" "B1" "C1" "X12"
[3,] "A1" "B1" "C1" "X13"
[4,] "A2" "B2" "C2" "X21"
[5,] "A2" "B2" "C2" "X22"
[6,] "A2" "B2" "C2" "X23"
[7,] "A2" "B2" "C2" "X24"
[8,] "A2" "B2" "C2" "X25"
[9,] "A3" "B3" "C3" "NA"
[10,] "A4" "B4" "C4" "X41"
[11,] "A4" "B4" "C4" "X42"
[12,] "A4" "B4" "C4" "X43"
[13,] "A5" "B5" "C5" "X51"

 >

This of course is a matrix; you can convert it back to a dataframe using as.data.frame() if you desire. Use either "NA" (with quotes) or NA (without quotes) to control whether you get just the string "NA" or an actual character NA value in column 4. If you're processing a huge amount of data, you can probably do better by rewriting the above code to avoid implicit coercions of data types.

hope this helps,

Tony Plate

S.O. Nyangoma wrote:

> I have a dataset that is basically structureless. Its dimension varies 
> from row to row and sep(s) are a mixture of tab and semi colon (;) and 
> example is
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1       B1      C1       X11;X12;X13
> A2       B2      C2       X21;X22;X23;X24;X25
> A3       B3      C3       
> A4       B4      C4       X41;X42;X43
> A5       B5      C5       X51
> 
> etc., say. Note that a blank under HEADER3 corresponds to non 
> occurance and all semi colon (;) delimited variables are under 
> HEADER3. These values run into tens of thousands. I want to give some 
> order to this queer matrix to something like:
> 
> HEADER1 HEADER2 HEADER3   HEADER3
> A1       B1      C1       X11
> A1       B1      C1       X12
> A1       B1      C1       X13
> A1       B1      C1       X14
> A2       B2      C2       X21
> A2       B2      C2       X22
> A2       B2      C2       X23
> A2       B2      C2       X24
> A2       B2      C2       X25
> A2       B2      C2       X26
> A3       B3      C3       NA
> A4       B4      C4       X41
> A4       B4      C4       X42
> A4       B4      C4       X43
> 
> Is there a brilliant R-way of doing such task?
> 
> Goodday. Stephen.
> 
> 
> 
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
> Date: Monday, August 15, 2005 11:13 pm
> Subject: Re: [R] How to get a list work in RData file
> 
> 
>>On Mon, 15 Aug 2005, Xiyan Lon wrote:
>>
>>
>>>Dear R-Helper,
>>
>>(There are quite a few of us.)
>>
>>
>>>I want to know how I get a list  work which I saved in RData 
>>
>>file. For
>>
>>>example,
>>
>>I don't understand that at all, but it looks as if you want to 
>>save an 
>>unevaluated call, in which case see ?quote and use something like
>>
>>xyadd <- quote(test.xy(x=2, y=3))
>>
>>load and saving has nothing to do with this: it doesn't change the 
>>meaning 
>>of objects in the workspace.
>>
>>
>>>>test.xy <- function(x,y) {
>>>
>>>+    xy <- x+y
>>>+    xy
>>>+ }
>>>
>>>>xyadd <- test.xy(x=2, y=3)
>>>>xyadd
>>>
>>>[1] 5
>>>
>>>>x1 <- c(2,43,60,8)
>>>>y1 <- c(91,7,5,30)
>>>>
>>>>xyadd1 <- test.xy(x=x1, y=y1)
>>>>xyadd1
>>>
>>>[1] 93 50 65 38
>>>
>>>>save(list = ls(all=TRUE), file = "testxy.RData")
>>>>rm(list=ls(all=TRUE))
>>>>load("C:/R/useR/testxy.RData")
>>>>ls()
>>>
>>>[1] "test.xy" "x1"      "xyadd"   "xyadd1"  "y1"
>>>
>>>>ls.str(pat="xyadd")
>>>
>>>xyadd :  num 5
>>>xyadd1 :  num [1:4] 93 50 65 38
>>>
>>>When I run, I know the result like above
>>>
>>>>xyadd
>>>
>>>[1] 5
>>>
>>>>xyadd1
>>>
>>>[1] 93 50 65 38
>>>
>>>what I want to know, is there any function to make the result like:
>>>
>>>
>>>>xyadd
>>>
>>>        test.xy(x=2, y=3)
>>>
>>>and
>>>
>>>
>>>>xyadd1
>>>
>>>       test.xy(x=x1, y=y1)
>>
>>-- 
>>Brian D. Ripley,                  ripley@stats.ox.ac.uk
>>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>University of Oxford,             Tel:  +44 1865 272861 (self)
>>1 South Parks Road,                     +44 1865 272866 (PA)
>>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-
>>guide.html
> 
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Aug 16 08:45:07 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:21:54 EST