# [R] Histograms, density, and relative frequencies

From: Bret Collier <bacolli_at_uark.edu>
Date: Thu 08 Jul 2004 - 03:29:40 EST

R-users,

I have been using R for about 1 year, and I have run across a couple of graphics problem that I am not quite sure how to address. I have read up on the email threads regarding the differences between density and relative frequencies (count/sum(count) on the R list, and I am hoping that someone could provide me with some advice/comments concerning my approach. I will admit that some of the underlying mathematics of the density discussion are beyond my current understanding, but I am looking into it.

I have a data set (600,000 obs) used to parameterize a probabilistic causal model where each obs is a population response for one of 2 classes (either regs1 and regs2). I have been attempting to create 1 marginal probability plot with 2 lines (one for each class). Using my rather rough code, I created a plot that seems to adhere to the commonly used (although from what I can understand wrong) relative frequency histogram approach.

My rough code looks like this:

bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1) par(mfrow=c(1, 1))
fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk) fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk) count1 <- fawn1\$counts/sum(fawn1\$counts) count2 <- fawn2\$counts/sum(fawn2\$counts)

b <- c(0, .05, .1, .15, .2, .25, .3, .35)
plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40), pch=".", bty="l")
lines(spline(count1~b), lty=c(1), lwd=c(2), col="black")
lines(spline(count2~b), lty=c(2), lwd=c(2), col="black")
axis(side=1, at=c(0, .05, .1, .15, .2, .25, .3, .35))

Using the above, I get frequency values for regs1 that look like this (which is the same as output for my probabilistic model):
> count1

[1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02 [6] 4.698426e-03 4.488942e-04 4.322685e-05

First, count1 is the frequency of occurrence within range 0-0.05, but when plotted is the value at b=0 and does not really represent the range? Are there any suggestions on a technique to approach this?

Next: Using the above code, the x-axis values end at 0.35, but the axis continues (because bk ends at 1)? While there is the chance of occurrence out past .35, it is low and I want to extend the lines to about .35 and clip the x-axis. But, I have been unable to figure out how to clip Could someone point me in the correct direction?

TIA, Bret A. Collier
Arkansas Cooperative Fish and Wildlife Research Unit Department of Biological Sciences University of Arkansas

R-help@stat.math.ethz.ch mailing list