Re: [R] converting a data set to a format for time series analysis

From: Ricardo Pietrobon <pietr007_at_gmail.com>
Date: Wed, 11 Jun 2008 17:27:11 -0400

Jim, it worked perfectly. thanks a lot

On Mon, Jun 9, 2008 at 8:58 PM, jim holtman <jholtman_at_gmail.com> wrote:
> This should do it:
>
>> x <- read.table(textConnection("subject hospital date_enrollment
>> hospital_beds
> + 1 hospitalA 1/3/2002 300
> + 2 hospitalA 1/6/2002 300
> + 3 hospitalB 2/4/2002 150
> + 4 hospitalC 3/2/2002 200"), header=TRUE)
>> closeAllConnections()
>> y <- as.Date(x$date_enrollment, "%m/%d/%Y")
>> z <- cbind(x, year=format(y, "%Y"), month=format(y, "%m"))
>> # partition the data
>> z.s <- split(z, list(z$year, z$month, z$hospital), drop=TRUE)
>> # now aggregate
>> do.call(rbind, lapply(z.s, function(a) data.frame(hospital=a$hospital[1],
>> cases=nrow(a),
> + year=a$year[1], month=a$month[1], beds=a$hospital[1])))
> hospital cases year month beds
> 2002.01.hospitalA hospitalA 2 2002 01 hospitalA
> 2002.02.hospitalB hospitalB 1 2002 02 hospitalB
> 2002.03.hospitalC hospitalC 1 2002 03 hospitalC
>>
>>
>>
>
>
> On Mon, Jun 9, 2008 at 1:51 PM, Ricardo Pietrobon <pietr007_at_gmail.com>
> wrote:
>>
>> Jim, thanks a lot. This does the trick for dates, but what I have
>> been struggling the most with is actually the conversion from having
>> one subject per row to having one month per row. I didn't explain
>> that well at all in my previous email and so let me try again. The
>> idea is that the current data set is displayed with one subject per
>> row. I would like to have it displayed having one hospital per month
>> per row. For example, the new data set would look like this:
>>
>> month year site number_enrolled_subjects
>> hospital_beds
>> 1 2002 hospitalA 22
>> 300
>>
>> meaning that hospital A enrolled 22 subjects in 01/2002, and hospital
>> A has 300 beds -- the beds variable is one variable in a vector that
>> would display all the covariates for my ARIMA model
>>
>> your suggestion solved the problem for the dates, but the command I am
>> looking for now is something that would count the number of subjects
>> per site per month of a year and then displayed it in the format
>> above. any thoughts?
>>
>> I really appreciate your help
>>
>>
>>
>>
>> On Mon, Jun 9, 2008 at 1:04 PM, jim holtman <jholtman_at_gmail.com> wrote:
>> > Will something like this work for you:
>> >
>> >> x <- read.table(textConnection("subject hospital date_enrollment
>> >> hospital_beds
>> > + 1 hospitalA 1/3/2002 300
>> > + 2 hospitalA 1/6/2002 300
>> > + 3 hospitalB 2/4/2002 150
>> > + 4 hospitalC 3/2/2002 200"), header=TRUE)
>> >> closeAllConnections()
>> >> y <- as.Date(x$date_enrollment, "%m/%d/%Y")
>> >> cbind(x, year=format(y, "%Y"), month=format(y, "%m"))
>> > subject hospital date_enrollment hospital_beds year month
>> > 1 1 hospitalA 1/3/2002 300 2002 01
>> > 2 2 hospitalA 1/6/2002 300 2002 01
>> > 3 3 hospitalB 2/4/2002 150 2002 02
>> > 4 4 hospitalC 3/2/2002 200 2002 03
>> >>
>> >>
>> >
>> >
>> > On Mon, Jun 9, 2008 at 12:45 PM, Ricardo Pietrobon <pietr007_at_gmail.com>
>> > wrote:
>> >>
>> >> I currently have a data set describing human subjects enrolled into an
>> >> international clinical trial, the name of the hospital enrolling this
>> >> human subject, the date when the subject was enrolled, and a vector
>> >> with variables representing characteristics of the site (e.g., number
>> >> of beds in a hospital). my data sets looks like this:
>> >>
>> >> subject hospital date_enrollment hospital_beds
>> >> 1 hospitalA 1/3/2002 300
>> >> 2 hospitalA 1/6/2002 300
>> >> 3 hospitalB 2/4/2002 150
>> >> 4 hospitalC 3/2/2002 200
>> >>
>> >> to perform a time series analysis I am now trying to get to a format
>> >> that would give me the following variables:
>> >>
>> >> month year site number_enrolled_subjects hospital_beds
>> >>
>> >> the data would be displayed on one-month intervals, and number of
>> >> subjects clustered around sites.
>> >>
>> >> any help would be greatly appreciate
>> >>
>> >> thanks
>> >>
>> >>
>> >> Ricardo
>> >>
>> >> ______________________________________________
>> >> R-help_at_r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Jim Holtman
>> > Cincinnati, OH
>> > +1 513 646 9390
>> >
>> > What is the problem you are trying to solve?
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 11 Jun 2008 - 21:53:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 11 Jun 2008 - 22:30:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive