[R] Operating on count lists of non-equal lengths

From: Kari Manninen <kari_at_econadvisor.com>
Date: Sun, 09 Jan 2011 00:19:51 -0600


This is my first post to R-help and I look forward receiving some advice for a novice like me...

I’ve got a simple repeated (4 periods so far) 10-question survey data that is very easy to work on Excel. However, I’d like to move the compilation to R but I’m having some trouble operating on count list data in a neat way.

The data C
> str(C)
'data.frame': 551 obs. of 13 variables:
$ TIME : int 1 1 1 1 1 1 1 1 1 1 ...
$ Sector : Factor w/ 6 levels "D","F","G","H",..: 1 1 1 1 1 1 1 1 1 1 ...
$ COMP : Factor w/ 196 levels " (_____ __ _____) ",..: 73 133 128
109 153 147 56 26 142 34 ...
$ Q1 : int 0 0 1 1 0 -1 -1 1 1 -1 ...
$ Q2 : int 0 0 0 -1 0 -1 0 0 1 -1 ...
$ Q3 : int 0 0 0 1 0 -1 -1 1 1 -1 ...
$ Q4 : int -1 0 0 0 0 -1 0 -1 0 -1 ...
$ Q5 : int 0 0 0 -1 0 -1 0 -1 0 0 ...
$ Q6 : int 0 0 0 1 0 -1 0 -1 0 0 ...
$ Q7 : int 0 1 1 0 0 0 1 0 1 1 ...
$ Q8 : int 0 0 0 0 0 -1 0 0 1 0 ...
$ Q9 : int 0 1 0 0 0 -1 0 -1 1 -1 ...
$ Q10 : int 0 0 0 0 -1 -1 0 -1 0 0 ...

> summary(C)

       TIME Sector COMP Q1 Q2

  Min.   :1.000   D:130   A:  4   Min.   :-1.000   Min.   :-1.0000
  1st Qu.:2.000   F:126   B:  4   1st Qu.: 0.000   1st Qu.: 0.0000
  Median :3.000   G:158   C:  4   Median : 1.000   Median : 0.0000
  Mean   :2.684   H: 26   D:  4   Mean   : 0.446   Mean   : 0.2178
  3rd Qu.:4.000   I: 20   E:  4   3rd Qu.: 1.000   3rd Qu.: 1.0000
  Max.   :4.000   J: 91   F:  4   Max.   : 1.000   Max.   : 1.0000
                    (Other):527   NA's   :60.000   NA's   :69.0000

The aim is to produce balance scores between positive and negative answers’ shares in the data. First counts of -1, 0 and 1 (negative, neutral, positive) and missing NA (it would be som much simple without the missing values) for each question Q1-Q10 for each period (TIME) in 6 Sectors:

b<-apply(C[,4:13], 2, function (x) tapply(x,C[,1:2], count))

I know that b is a list of data.frames dim(4x6) for each question, where each ‘cell’ is a count list.

For example, for Question 1, Time period 2, Sector 1: > str(b$Q1[2,1])
List of 1
$ :’data.frame’: 4 obs. of 2 variables:
    ..$ x : int [1:4] -1 0 1 NA
    ..$ freq : int [1:4] 3 9 12 2

Now I would like to group questions (C[, 4:6], C[, 7], C[8:9], C[10:11] and C[, 12:13]) and sum counts (-1, 0, 1) for these groups and present them in percentage terms. I don’t know how to this efficiently for the whole data. I would not like to go through each cell separately…

Then I’d give each group a balance score based on something like:

Score = 100 + 100*[ pos% - neg%] for each group by TIME, Sector, while excluding the missing observations.

### This is not working
Score <- 100 + 100*[sum(count( =="1")/sum(count(list( "-1", "0","1") - sum(count( =="-1")/sum(count(list( "-1", "0","1")] for each 5 groups defined above and by TIME, Sector

I would greatly appreciate your help on this.

Regards,
- Kari Manninen



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 09 Jan 2011 - 06:25:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 22 Jan 2011 - 14:50:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive