Re: [R] assigning creating missing rows and values

From: <Adele_Thompson_at_cargill.com>
Date: Fri, 13 May 2011 15:54:25 -0500

This code works great. I will improve my descriptions of what I want in the future. Thanks for the help.

-----Original Message-----
From: gunter.berton_at_gene.com [mailto:gunter.berton_at_gene.com] Sent: Friday, May 13, 2011 03:44 PM
To: Thompson, Adele - Adele_Thompson_at_cargill.com Cc: dwinsemius_at_comcast.net; r-help_at_r-project.org Subject: Re: [R] assigning creating missing rows and values

Adele:

You are of course correct -- my earlier proposed solution is dumb (thank you for being polite and not saying this :-) )

With your more complete explanation -- which might have helped you get a quicker reply earlier from others had you given it at the beginning
-- I was able to come up with a what I believe is a reasonable
solution, encapsulated in the fillIn() function below. Do note that it does no checking, which you might wish to add for a production version assuming it does what you want.

Note that one can write a solution fairly trivially by iterating through the matched indices. But I wanted to find an efficient and simple one pass solution, similar to what is done in rle() . Here it is. I think it's OK, but do check it out thoroughly before using.

fillIn <- function(

	x,  ## sorted index vector
	vals, ## values corresponding to x to be filled in
	allx, ## full set of all sorted indices of which x is a subset
	init = 0 ## set of initial values to use if first x is not first in allx
	)

{
	matchx <- c(match(x,allx),length(allx)+1)
	if(x[1] != allx[1]) {
		vals <- c(init,vals)
		matchx <- c(1,matchx)
		}
	rep.int(vals,diff(matchx))

}

# # Test it.

> vals <- letters[1:5]
> allx <- 1:5
> x <- 1:5
> fillIn(x,vals,allx)

[1] "a" "b" "c" "d" "e"

> n <- c(2,4)
> fillIn(x[n],vals[n],allx,init="z")
[1] "z" "b" "b" "d" "d"

> n <- c(1,3)
> fillIn(x[n],vals[n],allx,init="z")
[1] "a" "a" "c" "c" "c"

> n <- 3:5
> fillIn(x[n],vals[n],allx,init="z")
[1] "z" "z" "c" "d" "e"

In your application, x would correspond to the vector of times you have; vals to the cumulative intake at those times; allx is the complete set of times.

As David suggested, this almost certainly replicates existing code, probably in one of Gabor's packages. But I needed the exercise(my brain cells are decaying). If you have already found such code, please use it instead of the above, as the existing version will almost certainly be better and is better tested.

HTH, Bert

On Fri, May 13, 2011 at 6:34 AM, <Adele_Thompson_at_cargill.com> wrote:

> The problem with using cumsum, is that the measured output is the cumulative feed consumed throughout the day. When the animals do not eat for 30 minutes or so, it will not output a new value, but as soon as they do eat, the scale will weigh the difference and then output the cumulative feed eaten that day. I can take the difference at each weighing so the numbers are the amount eaten at that feeding time, and then go back and use cumsum, but that seems a bit round about.
>
>
> -----Original Message-----
> From: gunter.berton_at_gene.com [mailto:gunter.berton_at_gene.com]
> Sent: Thursday, May 12, 2011 05:49 PM
> To: Thompson, Adele - Adele_Thompson_at_cargill.com
> Cc: dwinsemius_at_comcast.net
> Subject: Re: [R] assigning creating missing rows and values
>
> ... Your detailed subject matter knowledge is always more relevant
> than my general statistical comments.
>
> If you are recording something like total cumulative input and the
> missing values must be 0 input, you may wish to consider entering them
> as 0 and then using the cumsum() function on the vector of inputs to
> produce the cumulative totals for all times. Depending on your data,
> code that might do this is:
>
> data:
>
> time  = ordered vector of times at which you have data
> y = vector of (unaccumulated) input values, of same length as time
> alltimes = complete ordered vector of all times
>
>
> code: (untested)
>
> new.y <- rep(0,length(alltimes))
> new.y[match(time,alltimes)] <- y
> cumsum(new.y)  ## is your answer
>
> Note also that if y is a vector of your cumulative inputs,
> c(y[1],diff(y))  is a vector of the unaccumulated inputs.
>
> There, of course, many slicker ways to do this depending on your the
> form of your data.
>
> Cheers,
> Bert
>
>
>
> But I may misunderstand your issues ...
>
> Cheers,
> Bert
>
>
>
> On Thu, May 12, 2011 at 2:26 PM,  <Adele_Thompson_at_cargill.com> wrote:
>> I am still working on the weights problem. If the animals do not eat (like after sunset), then no new feed weight will be calculated and no new row will be entered. Thus, if I just use the previous value, it should be correct for how much cumulative feed was eaten that day up to that point.
>> I will play around with that package and try getting it to work for me. Thank you.
>>
>> -----Original Message-----
>> From: gunter.berton_at_gene.com [mailto:gunter.berton_at_gene.com]
>> Sent: Thursday, May 12, 2011 04:13 PM
>> To: dwinsemius_at_comcast.net
>> Cc: Thompson, Adele - Adele_Thompson_at_cargill.com; r-help_at_r-project.org
>> Subject: Re: [R] assigning creating missing rows and values
>>
>> ...  But beware: Last observation carried forward is a widely used but
>> notoriously bad (biased) way to impute missing values; and, of course,
>> inference based on such single imputation is bogus (how bogus depends
>> on how much imputation, among other things, of course).
>> Unfortunately, dealing with such data "well" requires considerable
>> statistical sophistication, which is why statisticians are widely
>> employed in the clinical trial business, where missing data in
>> longitudinal series are relatively common. You may therefore find it
>> useful to consult a local statistician if one is available.
>>
>> As an extreme -- and unrealistic -- example of the problem,  suppose
>> your series consisted of 12 hours of data measured every half hour and
>> that one series had only two measurements, the first and the last. The
>> first value is 10 and the last is 1. LOCF would fill in the missings
>> as all 10's. Obviously, a dumb thing to do. For real data, the problem
>> would  not be so egregious, but the fundamental difficulty is the
>> same.
>>
>> (Apologies to those for whom my post is a familiar, boring refrain.
>> Unfortunately, I do not have the imagination to offer better).
>>
>> Cheers,
>> Bert
>>
>> On Thu, May 12, 2011 at 1:43 PM, David Winsemius <dwinsemius_at_comcast.net> wrote:
>>>
>>> On May 12, 2011, at 4:33 PM, Schatzi wrote:
>>>
>>>> I have a dataset where I have missing times (11:00 and 16:00). I would
>>>> like
>>>> the outputs to include the missing time so that the final time vector
>>>> looks
>>>> like "realt" and has the previous time's value. Ex. If meas at time 15:30
>>>> is
>>>> 0.45, then the meas for time 16:00 will also be 0.45.
>>>> meas are the measurements and times are the times at which they were
>>>> taken.
>>>>
>>>> meas<-runif(18)
>>>>
>>>> times<-c("08:30","09:00","09:30","10:00","10:30","11:30","12:00","12:30","13:00","13:30","14:00","14:30","15:00",
>>>> "15:30" ,"16:30","17:00","17:30","18:00")
>>>> output<-data.frame(meas,times)
>>>>
>>>> realt<-c("08:30","09:00","09:30","10:00","10:30","11:00","11:30","12:00","12:30","13:00","13:30","14:00","14:30","15:00","15:30","16:00","16:30","17:00","17:30","18:00")
>>>
>>> Package 'zoo' has an 'na.locf' function which I believe stands for "NA's
>>> last observation carried forward". So make a regular set of times, merge and
>>> "carry forward". I'm pretty sure you can find may examples in the Archive.
>>> Gabor is very good about spotting places where his many contributions can be
>>> successfully deployed.
>>>
>>> --
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> "Men by nature long to get on to the ultimate truths, and will often
>> be impatient with elementary studies or fight shy of them. If it were
>> possible to reach the ultimate truths without the elementary studies
>> usually prefixed to them, these would not be preparatory studies but
>> superfluous diversions."
>>
>> -- Maimonides (1135-1204)
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
> http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>



--

"Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions."

Bert Gunter
Genentech Nonclinical Biostatistics



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 13 May 2011 - 20:56:58 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 13 May 2011 - 21:40:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive