Re: [R] converting a data set to a format for time series analysis

From: jim holtman <jholtman_at_gmail.com>
Date: Mon, 09 Jun 2008 20:58:26 -0400

This should do it:

> x <- read.table(textConnection("subject hospital date_enrollment
hospital_beds

+  1       hospitalA       1/3/2002        300
+  2       hospitalA       1/6/2002        300
+  3       hospitalB       2/4/2002        150
+  4       hospitalC       3/2/2002        200"), header=TRUE)

> closeAllConnections()
> y <- as.Date(x$date_enrollment, "%m/%d/%Y")
> z <- cbind(x, year=format(y, "%Y"), month=format(y, "%m"))
> # partition the data
> z.s <- split(z, list(z$year, z$month, z$hospital), drop=TRUE)
> # now aggregate
> do.call(rbind, lapply(z.s, function(a) data.frame(hospital=a$hospital[1],
cases=nrow(a),
+     year=a$year[1], month=a$month[1], beds=a$hospital[1])))
                   hospital cases year month      beds
2002.01.hospitalA hospitalA     2 2002    01 hospitalA
2002.02.hospitalB hospitalB     1 2002    02 hospitalB
2002.03.hospitalC hospitalC     1 2002    03 hospitalC

>
>
>

On Mon, Jun 9, 2008 at 1:51 PM, Ricardo Pietrobon <pietr007_at_gmail.com> wrote:

> Jim, thanks a lot. This does the trick for dates, but what I have
> been struggling the most with is actually the conversion from having
> one subject per row to having one month per row. I didn't explain
> that well at all in my previous email and so let me try again. The
> idea is that the current data set is displayed with one subject per
> row. I would like to have it displayed having one hospital per month
> per row. For example, the new data set would look like this:
>
> month year site number_enrolled_subjects
> hospital_beds
> 1 2002 hospitalA 22
> 300
>
> meaning that hospital A enrolled 22 subjects in 01/2002, and hospital
> A has 300 beds -- the beds variable is one variable in a vector that
> would display all the covariates for my ARIMA model
>
> your suggestion solved the problem for the dates, but the command I am
> looking for now is something that would count the number of subjects
> per site per month of a year and then displayed it in the format
> above. any thoughts?
>
> I really appreciate your help
>
>
>
>
> On Mon, Jun 9, 2008 at 1:04 PM, jim holtman <jholtman_at_gmail.com> wrote:
> > Will something like this work for you:
> >
> >> x <- read.table(textConnection("subject hospital date_enrollment
> >> hospital_beds
> > + 1 hospitalA 1/3/2002 300
> > + 2 hospitalA 1/6/2002 300
> > + 3 hospitalB 2/4/2002 150
> > + 4 hospitalC 3/2/2002 200"), header=TRUE)
> >> closeAllConnections()
> >> y <- as.Date(x$date_enrollment, "%m/%d/%Y")
> >> cbind(x, year=format(y, "%Y"), month=format(y, "%m"))
> > subject hospital date_enrollment hospital_beds year month
> > 1 1 hospitalA 1/3/2002 300 2002 01
> > 2 2 hospitalA 1/6/2002 300 2002 01
> > 3 3 hospitalB 2/4/2002 150 2002 02
> > 4 4 hospitalC 3/2/2002 200 2002 03
> >>
> >>
> >
> >
> > On Mon, Jun 9, 2008 at 12:45 PM, Ricardo Pietrobon <pietr007_at_gmail.com>
> > wrote:
> >>
> >> I currently have a data set describing human subjects enrolled into an
> >> international clinical trial, the name of the hospital enrolling this
> >> human subject, the date when the subject was enrolled, and a vector
> >> with variables representing characteristics of the site (e.g., number
> >> of beds in a hospital). my data sets looks like this:
> >>
> >> subject hospital date_enrollment hospital_beds
> >> 1 hospitalA 1/3/2002 300
> >> 2 hospitalA 1/6/2002 300
> >> 3 hospitalB 2/4/2002 150
> >> 4 hospitalC 3/2/2002 200
> >>
> >> to perform a time series analysis I am now trying to get to a format
> >> that would give me the following variables:
> >>
> >> month year site number_enrolled_subjects hospital_beds
> >>
> >> the data would be displayed on one-month intervals, and number of
> >> subjects clustered around sites.
> >>
> >> any help would be greatly appreciate
> >>
> >> thanks
> >>
> >>
> >> Ricardo
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem you are trying to solve?
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 10 Jun 2008 - 01:01:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 11 Jun 2008 - 22:30:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive