# Re: [R] Calculate NAs from known data: how to?

From: Brian G. Peterson <brian_at_braverock.com>
Date: Tue 17 Oct 2006 - 11:48:53 GMT

Torleif Markussen Lunde wrote:
> In a dataset I have length and age for cod. The age, however, is ony
> given for 40-100% of the fish. What I need to do is to fill inn the NAs
> in a correct way, so that age has a value for each length. This is to be
> done for each sample seperately (there are 324 samples), meaning the NAs
> for sampleno 1 shall be calculated from the known values from sampleno
1.
>
> As for example length 55 cm can be both 4 and 5 years, I guess a fish
> with NA age and length 55 cm should be given a "random" age given a
> probability for example "55 cm = 4 years has a p=75%, while 55 cm = 4
> years has a p=25%". Those "p-values" should be calculated from the real
> data.
>
> How can this be done in R, and what is the right way to do it?

Given the size of your sample, wouldn't it be more statistically valid to set the age of the NA records to the mean age of records of matching length?  I suppose you could also use resampling or a bootstrap, but I'm not sure that adding randomization will give results that are any more statistically valid than using the mean.

Regards,

- Brian

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Oct 17 21:52:55 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 17 Oct 2006 - 12:30:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.