Re: [R] Probability weights with density estimation

From: Charles C. Berry <>
Date: Wed, 16 Jan 2008 10:32:09 -0800

On Wed, 16 Jan 2008, David Winsemius wrote:

> I am a physician examining an NHANES dataset available at the NCHS
> website:
> Thank you to the R authors and the foreign package authors in
> particular. Importing from the SAS export fomat file was a snap. It
> consists of demographic data linked to laboratory measurements. Each
> subject has an associated sampling weight. I have gotten informative
> displays following the examples using kde2d() in V&R MASSe2 (more
> thanks), but these were unweighted analyses. The ratio of total
> cholesterol (TC) to HDL cholesterol is used clinically to estimate risk
> of future heart disease, and I am looking at how such ratios "divide"
> or intersect with the TC x HDL-C distribution. Rather than include all
> the real data, let me just post a simulation that shows a contourplot
> reasonably similar to what I am seeing.
> TC.ran <- exp(rnorm(400,1.5,.3))
> HDL.ran <- exp(rnorm(400,.4,.3) )
> f1<-kde2d(HDL.ran,TC.ran,n=25,lims=c(0,4,2,10))
> contour(f1$x,f1$y,f1$z,ylim=c(0,8),xlim=c(0,3),ylab="TC mmol/L",
> xlab="HDL mmol/L")
> lines(f1$x,5*f1$x) # iso-ratio lines
> lines(f1$x,4*f1$x)
> lines(f1$x,3*f1$x)
> Two questions:
> Is there a 2d density estimation function that has provision for
> probability weights (or inverse sampling probabilities)? I seem to
> remember a discussion on the list about whether such a procedure would
> be meaningful, but my searches cannot locate that thread or any worked
> examples that incorporate sampling weights.

It looks like you can use bkde2D from the KernSmooth package.

You might look at the function sqlocpoly in surveyNG which uses the KernSmooth package for details.

> If there is such a function, would it be a simple matter to calculate
> the proportion of the total population that would be expected to have a
> ratio of y.ran/x.ran of less than a particular number, say 4.0?

Maybe my eyesight is failing, but I did not see where you define 'y.ran' and 'x.ran'. If they, like 'TC.ran' and 'HDL.ran', are just variables that are dierctly measured in your survey, then estimating the proportion less than a given value for y.ran/x.ran is standard survey sampling fare and no density estimation is needed. In which case, the 'survey' package at CRAN is what you want.

HTH, Chuck

> --
> Respectfully;
> David Winsemius
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E	            UC San Diego La Jolla, San Diego 92093-0901 mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jan 2008 - 18:37:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 18 Jan 2008 - 05:30:08 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive