Re: [R] data shape

From: Charilaos Skiadas <cskiadas_at_gmail.com>
Date: Thu, 20 Dec 2007 11:04:18 -0500

HI Tom,
On Dec 20, 2007, at 9:06 AM, Tom Sgouros wrote:

>
> Hello:
>
> I have been give a spreadsheet to work with formed as one big table.
> What it consists of is a 10-row-by-40-column table for each of
> about 70
> different locations. In other words, the table row names are repeated
> 70 times, once for each of the locations (whose names also appear
> in the
> same column, where it's talking about the totals for that
> location), e.g.:
> A B C
> Location1 15 73 123 <- this row is the sum of the following 3
> Under 10 6 42 23
> 10 - 25 4 15 23
> Over 25 5 16 77
> Location2 18 75 113 <- same here
> Under 10 7 45 13
> 10 - 25 5 18 44
> Over 25 6 12 56
>
> I want to get this into R as a collection of data frames, one for each
> of my locations. My questions:
>
> 1. There is a way to handle a collection of data frames, isn't
> there?
> No doubt there are plenty, but what's the easiest way, so that I
> can address them collectively, allowing me to ask such
> questions as
> what's the max of the over 25's in column C?

A list is the best way for that. Then you can use things like lapply and sapply, as I do towards the end of the script that follows.

> 2. What's the easiest way to read such a data array from a text
> file?
> I can do some editing of a csv file produced from the
> spreadsheet,
> but don't really know what to aim for.

Here is the code I used to read your example, which I saved as a comma-separated file, with the only addition that I added the name "Names" to the first column. You will probably need to adjust filename, nlocations and rows.per.location.

filename <- "~/Desktop/rows.txt"
nlocations <- 2
rows.per.location <- 3
data <- read.csv(filename)
data$Names <- gsub("\\s","", data$Names, perl=TRUE) # Trim off whitespace from first column
totals <- data[4*seq_len(nlocations)-3,] # Pick up the rows with the totals

actual.data <- data[-(4*seq_len(nlocations)-3),]     # Pick up the rest
location.names <-  totals[,1]                        # The location  
names are now the first column of totals data.by.location <- split(actual.data, rep(location.names, each=rows.per.location)) # this is the "workhorse" data.by.location <- lapply(data.by.location, function(x) {
   data.frame(x[,-1], row.names=x[,1])                   # Converting  
each list item to a better form
})
totals2 <- sapply(data.by.location, function(x) sapply(x,sum)) all(totals2 == t(totals[,-1])) # Should return true if the totals add up

> 3. Is there some shortcut that would allow me to read this directly
> from a spreadsheet?

Have a look at the xlsReadWrite package.

> Many thanks,
>
> -tom
>

Haris Skiadas
Department of Mathematics and Computer Science Hanover College



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Dec 2007 - 16:08:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 21 Dec 2007 - 16:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.