# Re: [R] Probability of exceedance function question

From: Thomas P. Colson <tpcolson_at_ncsu.edu>
Date: Sun 08 Oct 2006 - 21:21:44 GMT

area <- read.table(file = "c:/temp/area.asc", sep = " ", na.strings = "-9999", skip = 6)
area <- area[,-ncol(area)]
xLLcorner <- 1985649.0700408898
yLLcorner <- 841301.04004059616
cellsize <-20

```xURcorner <- xLLcorner + (cellsize * (ncol(area) - 1))
xLRcorner <- xURcorner
xULcorner <- xLLcorner
yULcorner <- yLLcorner + (cellsize * (nrow(area) - 1))
yURcorner <- yULcorner
yLRcorner <- yLLcorner
```

coordsa <- expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x = seq(xULcorner, xLRcorner, by = 20))
area<- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) names(area)<-c("x","y","z")
Plot(ecdf(area\$z))

-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand@nhh.no] Sent: Sunday, October 08, 2006 4:37 PM
To: Thomas P. Colson
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Probability of exceedance function question

On Sun, 8 Oct 2006, Thomas P. Colson wrote:

> I'm trying to calculate a cumulative area distribution (graph) of
> drainage areas. This is defined as P(A > A*). Simple in principle. I
> can do this in excel, with "COUNTIF", which will count the number of
> cells in the row "area" that have area A, then determine, for each
> cell in the row "area, how many cells exceede that area, then dividing
> that number by the total number of cells, which gives me the
> probability that drainage area A exceeds drainage area A*.

Is this ecdf() of the vector or its suitable subset? If so, it runs very fast even for large data sets. For plotting, bear in mind that you are generating a lot of output, though:

> t0 <- runif(100000)
> system.time(t1 <- ecdf(t0))
[1] 0.222 0.022 0.248 0.000 0.000
> system.time(plot(t1, pch="."))
[1] 1.089 0.079 1.186 0.000 0.000

>
> E.g, drainage area of 6 sq meters (One DEM grid cell) has a high
> probability of exceedance(.99), while a drainage area of 100,000
> square meters has a low probability of exceedance (.001).
>
> I wish to plot this relationship, and we all know that excel is not
> the tool of choice when working with hundreds of thousands of records.
> I'd like to port the CAD into a few R functions that I've already
> developed for other tests as well.
>
> So my challenge, in R, is to
> (1)count the number of rows in column "Area" that have AREA(*),
>
> (2) determine, by row, how many rows have an area greater than the
> area given in that one row
>
> (3) divide step 2 by number of rows (how can I do a row count and port
> that to a variable, as I have to do this on 10 datasets?)
>
> Thanks for any advice you can offer to this endevour
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand@nhh.no

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help