[R] working with summarized data

From: Rick Bischoff <rdbisch_at_gmail.com>
Date: Thu 31 Aug 2006 - 00:27:58 EST


The data sets I am working with all have a weight variable--e.g., each row doesn't mean 1 observation.

With that in mind, nearly all of the graphs and summary statistics are incorrect for my data, because they don't take into account the weight.



For example "median" is incorrect, as the quantiles aren't calculated with weights:

sum( weights[X < median(X)] ) / sum(weights)

This should be 0.5... of course it's not.


Unfortunately, it seems that most(all?) of R's graphics and summary statistic functions don't take a weight or frequency argument. (Fortunately the models do...)

Am I completely missing how to do this? One way would be to replicate each row proportional to the weight (e.g. if the weight was 4, we would 3 additional copies) but this will get prohibitive pretty quickly as the dataset grows.

Thanks in advance!



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Aug 31 04:06:23 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 01 Sep 2006 - 20:23:25 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.