From: Douglas Bates <bates_at_stat.wisc.edu>

Date: Mon, 12 May 2008 08:30:49 -0500

*> table(df$f12)
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 12 May 2008 - 13:34:49 GMT

Date: Mon, 12 May 2008 08:30:49 -0500

On Sun, May 11, 2008 at 9:49 AM, amarkos <amarkos_at_gmail.com> wrote:

> On May 11, 4:47 pm, "Douglas Bates" <ba...@stat.wisc.edu> wrote:

*>
**>> Do you mean that you want to collapse similar rows into a single row
**>> and perhaps a count of the number of times that this row occurs?
**>
**> Let me rephrase the problem by providing an example.
**>
**> Input:
**>
**> A =
**> [,1] [,2]
**> [1,] 1 1
**> [2,] 1 3
**> [3,] 2 1
**> [4,] 1 2
**> [5,] 2 1
**> [6,] 1 2
**> [7,] 1 1
**> [8,] 1 2
**> [9,] 1 3
**> [10,] 2 1
*

An important question here is do you start with two or more variables like the columns of your matrix A? If so, there is a more direct method of getting the answers that you want. The natural way to store such variables in R is as factors. I prefer to use letters instead of numbers to represent the levels of a factor (that way I don't confuse a factor with a numeric variable when I look at rows) so I would create a data frame with two factors instead of a matrix.

*> V1 <- factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2])
**> V2 <- factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3])
**> df <- data.frame(f1 = V1, f2 = V2)
**> df
*

f1 f2

1 A a

2 A c

3 B a

4 A b

5 B a

6 A b

7 A a

8 A b

9 A c

10 B a

You could produce the indicator matrix and check for unique rows, etc. - I will show that below - but all you need is the interaction of the two factors

*> df$f12 <- with(df, f1:f2)[drop = TRUE]
**> df
*

f1 f2 f12

1 A a A:a 2 A c A:c 3 B a B:a 4 A b A:b 5 B a B:a 6 A b A:b 7 A a A:a 8 A b A:b 9 A c A:c 10 B a B:a$ f2 : Factor w/ 3 levels "a","b","c": 1 3 1 2 1 2 1 2 3 1 $ f12: Factor w/ 4 levels "A:a","A:b","A:c",..: 1 3 4 2 4 2 1 2 3 4

> str(df)

'data.frame': 10 obs. of 3 variables: $ f1 : Factor w/ 2 levels "A","B": 1 1 2 1 2 1 1 1 1 2

A:a A:b A:c B:a

2 3 2 3

*> as.numeric(df$f12)
*

[1] 1 3 4 2 4 2 1 2 3 4

Notice that this shows you that there are four distinct combinations that occur 2, 3, 2 and 3 times respectively; the first combination occurs in rows 1 and 7, it consists of the first level of f1 and the first level of f2, etc.

If you really do want the indicator matrix you could generate it as

*> (ind <- cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df)))
*

f1A f1B f2a f2b f2c

1 1 0 1 0 0

2 1 0 0 0 1

3 0 1 1 0 0

4 1 0 0 1 0

5 0 1 1 0 0

6 1 0 0 1 0

7 1 0 1 0 0

8 1 0 0 1 0

9 1 0 0 0 1

10 0 1 1 0 0

*> unique(ind)
*

f1A f1B f2a f2b f2c

1 1 0 1 0 0

2 1 0 0 0 1

3 0 1 1 0 0

4 1 0 0 1 0

but working with the factors is generally much simpler than working with the indicators.

> # Indicator matrix

*> A <- data.frame(lapply(data.frame(obj), as.factor))
**>
**> nocases <- dim(obj)[1]
**> novars <- dim(obj)[2]
**>
**> # variable levels
**> levels.n <- sapply(obj, nlevels)
**> n <- cumsum(levels.n)
**>
**> # Indicator matrix calculations
**> Z <- matrix(0, nrow = nocases, ncol = n[length(n)])
**> newdat <- lapply(obj, as.numeric)
**> offset <- (c(0, n[-length(n)]))
**> for (i in 1:novars)
**> Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] <- 1
**>
**> #######
**>
**> Output:
**>
**> Z =
**>
**> [,1] [,2] [,3] [,4] [,5]
**> [1,] 1 0 1 0 0
**> [2,] 1 0 0 0 1
**> [3,] 0 1 1 0 0
**> [4,] 1 0 0 1 0
**> [5,] 0 1 1 0 0
**> [6,] 1 0 0 1 0
**> [7,] 1 0 1 0 0
**> [8,] 1 0 0 1 0
**> [9,] 1 0 0 0 1
**> [10,] 0 1 1 0 0
**>
**>
**> Z is an indicator matrix in the Multiple Correspondence Analysis
**> framework.
**> My problem is to collapse identical rows (e.g. 2 and 9) into a single
**> row and
**> store the row ids.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 12 May 2008 - 13:34:49 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 12 May 2008 - 19:30:37 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*