Re: [R] Excel Trend Function

From: Daniel Malter <daniel_at_umd.edu>
Date: Sun, 13 Jul 2008 15:19:00 -0400


Disputed was probably not the correct wording for it. However, imputation means that you make assumptions regarding the distribution of your missing data dependent on the data that is available to you. Felipe had time series data, and it is common to predict from the past to the future in such models. However, I tried to outline why making these assumptions may be critical in Felipe's case due to the "environment" in which his data was missing (namely that the distribution of his variable around the missing values seemed odd and raised questions whether or not they were measure without error).

The rule of thumb for imputation that I remember is: If you are not sure how plausible the assumptions were that you would have to make - drop it. But better, since newer, advice maybe available on that. However, I cannot remember reading a single study in my field in which data was imputed (I don't say there is none, but if they exist, they must be very rare), meaning that observations with missing data are typically dropped. Therefore, how you will be received for imputing data does not only depend on the particular application, but also on the research tradition of your field.  

General sources that review data imputation as well as associated problems are:

Horton and Lipsitz, 2001, The American Statistician Schafer and Graham, 2002, Psychological Methods

Best,
Daniel



cuncta stricte discussurus

-----Ursprüngliche Nachricht-----
Von: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] Im Auftrag von paulandpen
Gesendet: Sunday, July 13, 2008 4:27 AM
An: mazatlanmexico_at_yahoo.com; r-help_at_stat.math.ethz.ch Betreff: Re: [R] Excel Trend Function

Hi Felipe,

Daniel mentions imputation is a disputed practice. There are recommendations and rules of thumb for its use. I am not sure that imputation is disputed. I would be interested to see some links to articles recommending against its use.

Paul

> Hi:
> I have a dataset and need to interpolate for missing days. In Excel I
> either average from sampled days from above and below the missing days or
> use the TREND function to make up for the missing values. I have been
> reading about na.approx, is this function similar to the TREND function?
> Which is the best recommendable way to make up for missing data?
> Here's my dataset: weeks 17,18,26 and 46 have 0 daysSamp.
>
> Year Week daysSamp Lower TotalPD Upper varTotalPD
> 2006 47 6 126988 188259 249530 1045878675
> 2006 48 7 189155 253350 317545 1148102355
> 2006 49 7 103300 132741 162182 241480186
> 2006 50 6 11801 252576 493352 16151006813
> 2006 51 7 2348 3671 4994 487926
> 2006 52 5 2606 29901 57196 215454181
> 2006 2 7 2968 4513 6058 664723
> 2006 3 7 1128 1889 2650 161231
> 2006 4 7 479 963 1447 65196
> 2006 5 7 2819 4413 6007 708094
> 2006 6 6 -1009 3128 7264 4766743
> 2006 7 7 -5239 10769 26777 71387835
> 2006 8 7 150 503 856 34685
> 2006 9 7 1858 2989 4120 356562
> 2006 10 7 193 494 795 25281
> 2006 11 7 125 346 567 13627
> 2006 12 7 432 767 1102 31189
> 2006 13 7 1229 1867 2505 113569
> 2006 14 7 813 1339 1865 77140
> 2006 15 4 -66 124 315 10105
> 2006 16 7 152 903 1654 157242
> 2006 17 0
> 2006 18 0
> 2006 19 5 0 0 0 0
> 2006 20 4 0 0 0 0
> 2006 21 5 0 0 0 0
> 2006 22 6 0 0 0 0
> 2006 23 7 -65 285 635 34112
> 2006 24 6 0 0 0 0
> 2006 25 7 0 0 0 0
> 2006 26 0
> 2006 27 4 228 931 1634 137726
> 2006 28 4 801 2231 3662 569977
> 2006 29 4 4544 9242 13939 6147522
> 2006 30 5 15798 28465 41131 44697915
> 2006 31 5 25398 41049 56701 68245523
> 2006 32 5 48197 82216 116235 322416917
> 2006 33 5 142980 230411 317841 2129630128
> 2006 34 5 227141 360468 493794 4952314336
> 2006 35 5 467244 756325 1045405 23281569629
> 2006 36 5 281049 463331 645614 9256900449
> 2006 37 2 227636 620330 1013023 42961663047
> 2006 38 3 478990 983472 1487954 70903343603
> 2006 39 7 539690 846522 1153354 26228718974
> 2006 40 7 320959 457866 594773 5221891252
> 2006 41 7 427561 582452 737343 6683813344
> 2006 42 7 271788 351103 430418 1752614293
> 2006 43 7 165019 208853 252687 535301133
> 2006 44 7 91514 117390 143266 186537178
> 2006 45 7 59061 79187 99313 112842787
> 2006 46 0
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 13 Jul 2008 - 19:26:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 13 Jul 2008 - 19:31:35 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive