From: Petr Savicky <savicky_at_praha1.ff.cuni.cz>

Date: Mon, 04 Apr 2011 11:37:25 +0200

db

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 04 Apr 2011 - 09:40:01 GMT

Date: Mon, 04 Apr 2011 11:37:25 +0200

On Mon, Apr 04, 2011 at 01:11:37AM -0500, psombe wrote:

*> Hi,
*

> I'm new to R and trying to some simple analysis. I have a data set with

*> about 88000 transactions and i want to perform a simple support count
**> analysis of an itemset which is say not a complete transaction but a subset
**> of a transaction.
**> say
**>
**> {A,B,D} is a transaction and i want to find support of {A,B} even though it
**> never occurs as only A,B in the entire set
**>
**>
**> To this i needed to create a new itemsets class and then use the support
**> function but somehow the answers never seem to tally.
*

Hi.

The answer depends on the representation of the data set. Can you describe the representation?

A possible representation of a data set for itemsets counting is a matrix of 0/1. Using this representation, computing the support may be done as follows.

db <- matrix(0, nrow=5, ncol=5, dimnames=list(NULL, LETTERS[1:5]))

db[1, c("A", "B", "D")] <- 1 db[2, c("A", "B")] <- 1 db[3, c("A", "D", "E")] <- 1 db[4, c("B", "C", "D")] <- 1 db[5, c("A", "B", "C")] <- 1

db

A B C D E

[1,] 1 1 0 1 0 [2,] 1 1 0 0 0 [3,] 1 0 0 1 1 [4,] 0 1 1 1 0 [5,] 1 1 1 0 0

itemset <- c("A", "B")

# for each transaction, whether it contains c("A", "B") rowSums(db[, itemset]) == length(itemset)

** [1] TRUE TRUE FALSE FALSE TRUE
**
# the number of transactions containing c("A", "B")
sum(rowSums(db[, itemset]) == length(itemset))

[1] 3

Hope this helps.

Petr Savicky.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 04 Apr 2011 - 09:40:01 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 05 Apr 2011 - 15:15:27 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*