[R] How to deal with missing data?

From: Chaouch, Aziz <achaouch_at_nrcan.gc.ca>
Date: Fri 19 May 2006 - 22:57:07 EST

Hi All,

This is a question not directly related to R itself, it's about how to deal with missing data. I want to build wind roses i.e. circular histograms of wind directions and associated speeds to look for trends or changes in the wind patterns over several decades for some meteo stations. The database I have contains hourly records of wind direction and speed over the past 50 years.......obviously that's a huge database! Of course there are a lot of missing data and they are causing problems. Two major problems arise from the temporal distribution of wind records:

  1. Data are missing because of station shutdowns (consecutive missing data over days, weeks, months and even years for some stations!!!)
  2. In the past, wind records were performed only during daytime while recently they cover day and night time

On top of these situations, data can also miss "at random". The analysis is complicated by the fact that wind direction is a circular variable so specific tools must be used to handle this. I know there are different ways to deal with missing data such as Multiple Imputation but most assume gaussianity of the variables. Moreover when a record is missing in the database, it is missing for all variables so that it is apparently not possible to use other variables to produce estimates of missing wind records.

For now I'm considering the following:
- look at copula function to build a bivariate distribution of wind
direction and speeds and simulate values out of it to fill-in missing data. Produce several estimate of each missing data to assess the variability of the final results. The bivariate distribution should be modelled for every 5 or 10 years interval to accommodate for a possible trend in the data.

Well now you know more or less that I do not know a lot on the topic of missing data and desperately need your help :) If you have some hints on what techniques I may use or general advices, please let me know.

Thanks a lot,


        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri May 19 23:02:34 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 20 May 2006 - 00:10:17 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.