From: Rajarshi Guha <rxg218_at_psu.edu>

Date: Tue 05 Apr 2005 - 07:46:23 EST

james Holtman

Ray Brownrigg

Rajarshi Guha <rxg218_at_psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE

Q: Why did the mathematician name his dog "Cauchy"? A: Because he left a residue at every pole.

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Apr 05 07:50:35 2005

Date: Tue 05 Apr 2005 - 07:46:23 EST

On Mon, 2005-04-04 at 14:22 -0400, Rajarshi Guha wrote:

*> Hi,
*

> I have a set of x,y data points and each data point lies between (0,0)

*> and (1,1). Of this set I have selected all those that lie in the lower
**> triangle (of the plot of these points).
**>
**> What I would like to do is to divide the region (0,0) to (1,1) into
**> cells of say, side = 0.01 and then count the number of cells that
**> contain a point.
*

Thanks very much to Deepayan Sarkar, James Holtman and Ray Brownrigg for very efficient (and elegant) solutions. I've summarized them below:

Deepayan Sarkar

A combination of cut and table/xtabs should do it, e.g.:

x <- runif(3000)

y <- runif(3000)

fx <- cut(x, breaks = seq(0, 1, length = 101)) fy <- cut(y, breaks = seq(0, 1, length = 101))

txy <- xtabs(~ fx + fy)

image(txy > 0)

sum(txy > 0)

james Holtman

Here is a start. This creates a dataframe and then divides the data up
into 10 segments (you wanted 100, so extend it) and then counts the
number

in each cell.

*> df <- data.frame(x=runif(100), y=runif(100)) # create data
*

> breaks <- seq(0,1,.1) # define breaks; you would use 0.01

> table(cut(df$x, breaks=breaks,labels=F),cut(df

$y,breaks=breaks,labels=F))

# use 'cut' to partition and then 'table' to count

1 2 3 4 5 6 7 8 9 10

1 0 2 0 1 0 3 0 1 0 0

2 0 1 0 0 0 2 1 2 0 0

3 0 1 0 0 3 0 2 2 1 2

4 0 0 1 2 3 3 1 2 2 0

5 3 1 2 2 1 2 1 1 1 0

6 2 0 2 0 0 0 0 1 0 0

7 0 1 1 1 2 1 1 1 2 1

8 0 3 2 1 1 2 2 2 1 1

9 0 0 2 2 0 1 2 0 2 2

10 0 2 1 0 0 0 0 0 0 3

Ray Brownrigg

Another significantly faster way (but not generating row/column names)
is:

x <- runif(3000)

y <- runif(3000)

ints <- 100

myfun <- function(x, y, ints) {

fx <- x %/% (1/ints)

fy <- y %/% (1/ints)

txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts
dim(fxy) <- c(ints, ints)

return(txy)

}

myfun(x, y, ints)

Rajarshi Guha <rxg218_at_psu.edu> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE

Q: Why did the mathematician name his dog "Cauchy"? A: Because he left a residue at every pole.

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Apr 05 07:50:35 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:01 EST
*