Re: [R] Probability of exceedance function question

From: Thomas P. Colson <>
Date: Sun 08 Oct 2006 - 21:21:44 GMT

I am able to convert the flow accumulation grid into an area (for each pixel) grid, then import this into R as an ASCII file. The plot(ecdf) function in R seems to plot the opposite: curve starts at probability 0, for drainage area 0, should be the other way?

About 150,000 data points in these sets, ecdf curve plots in about 15 seconds.

Could the problem be, how I'm importing the data from ascii grid? Cellsize is 20 ft and z is the drainage area, for each cell (flow weighted)

 area <- read.table(file = "c:/temp/area.asc", sep = " ", na.strings = "-9999", skip = 6)
area <- area[,-ncol(area)]
xLLcorner <- 1985649.0700408898
yLLcorner <- 841301.04004059616
cellsize <-20

xURcorner <- xLLcorner + (cellsize * (ncol(area) - 1)) 
xLRcorner <- xURcorner 
xULcorner <- xLLcorner
yULcorner <- yLLcorner + (cellsize * (nrow(area) - 1)) 
yURcorner <- yULcorner 
yLRcorner <- yLLcorner 

coordsa <- expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x = seq(xULcorner, xLRcorner, by = 20))
area<- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) names(area)<-c("x","y","z")

-----Original Message-----
From: Roger Bivand [] Sent: Sunday, October 08, 2006 4:37 PM
To: Thomas P. Colson
Subject: Re: [R] Probability of exceedance function question

On Sun, 8 Oct 2006, Thomas P. Colson wrote:

> I'm trying to calculate a cumulative area distribution (graph) of
> drainage areas. This is defined as P(A > A*). Simple in principle. I
> can do this in excel, with "COUNTIF", which will count the number of
> cells in the row "area" that have area A, then determine, for each
> cell in the row "area, how many cells exceede that area, then dividing
> that number by the total number of cells, which gives me the
> probability that drainage area A exceeds drainage area A*.

Is this ecdf() of the vector or its suitable subset? If so, it runs very fast even for large data sets. For plotting, bear in mind that you are generating a lot of output, though:

> t0 <- runif(100000)
> system.time(t1 <- ecdf(t0))
[1] 0.222 0.022 0.248 0.000 0.000
> system.time(plot(t1, pch="."))
[1] 1.089 0.079 1.186 0.000 0.000

isn't at all bad!

> E.g, drainage area of 6 sq meters (One DEM grid cell) has a high
> probability of exceedance(.99), while a drainage area of 100,000
> square meters has a low probability of exceedance (.001).
> I wish to plot this relationship, and we all know that excel is not
> the tool of choice when working with hundreds of thousands of records.
> I'd like to port the CAD into a few R functions that I've already
> developed for other tests as well.
> So my challenge, in R, is to
> (1)count the number of rows in column "Area" that have AREA(*),
> (2) determine, by row, how many rows have an area greater than the
> area given in that one row
> (3) divide step 2 by number of rows (how can I do a row count and port
> that to a variable, as I have to do this on 10 datasets?)
> Thanks for any advice you can offer to this endevour
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon Oct 09 07:23:23 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 09 Oct 2006 - 09:30:09 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.