[R] Need ideas on how to show spikes in my data and how to code it in R

From: Thomas Frööjd <tfrojd_at_gmail.com>
Date: Mon, 23 Jun 2008 21:40:48 +0200


I have recently been analyzing birthweight data from a clinic. The data has obvious defects in that there is digit preference on certain weights making them overrepresented. This shows as spikes in the histogram on certain well rounded weights like 2, 2.5, 3, etc. I would like to show this to government officials but can't figure out how I should present the finding in an easy to understand manner.

My idea is this:

I have a dataset of 20 000 childbirths from another nation that I would like to plot in a graph over the histograms of birth weights from the clinic. This dataset doesn't share the digit preference problem. The idea is similar to how people sometimes plot a fitted normal density function over a histogram to show how data is distributed.

To do this I need to do three steps. None which I succeeded with so far

  1. Shift the mean and std on the reference dataset to the mean and std of my clinic birth weight data.
  2. Scale the data so they can be plotted on the same axis. The reference dataset has around 20 000 observations and my data from the clinic only around 3000 so I have to fix this otherwise the plot of the reference datset will be much bigger in the graph.
  3. Plot both on the same graph. The reference dataset like a density plot and my dataset as a histogram, that means weight bins on the x axis and number of observations on y. It should be added that my reference dataset isn't truly continuous but recorded at 100g intervals. This means both datasets have the same grouping however plotting both as histogram would probably make it harder to understand for a person with little training in statistics. This means that the reference dataset "density function" has to be smoothed somehow.

I would be very thankful for help on any of those steps. Also if you think this approach is wrong for some reason please tell me.

Best regards

Thomas Fröjd

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 23 Jun 2008 - 19:43:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 23 Jun 2008 - 21:31:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive