Re: [R] Data Frame housekeeping

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Tue, 24 May 2011 15:46:08 -0400

On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:

> Hello,
>
> I have a large data frame that is organized by date in a peculiar
> way. I
> am seeking advice on how to transform the data into a format that is
> of
> more use to me.
>
> The data is organized as follows:
>
> STN_ID YEAR MM ELEM X1 X2 X3
> X4 X5 X6 X7
> 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251
> -00243 -00249
> 2 2402594 1997 10 1 -00003 -00005 -00001 -00039
> -00031 -00036 -00033
> 3 2402594 1997 11 1 000025 000065 000070 000069
> 000115 000072 000093
>
> Where "MM" is the month of the year, and ELEM is the variable to which
> the values in the X* columns describe (in the actual data there are
> 31 X
> columns, one for each day of the month). The values in bold are the
> values that are transferred into the small chart below (which is the
> result I hope to get). This is to give a sense of how the data is
> picked
> out of the original data frame.

assuming this dataframe is named 'tst':

require(reshape2)
mtst <- melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3   str(mtst)
#----------

'data.frame':	54 obs. of  6 variables:
  $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
  $ YEAR    : num  1997 1997 1997 1997 1998 ...
  $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
  $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
  $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
  $ value   : chr  "-00233" "-00003" "000025" "000160" ...

dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM) #---------

     STN_ID YEAR MM variable      1      2
1  2402594 1997  9       X1 -00233 -00339
2  2402594 1997  9       X2 -00204 -00339
3  2402594 1997  9       X3 -00119 -00343
4  2402594 1997 10       X1 -00003 -00207
5  2402594 1997 10       X2 -00005 -00289
6  2402594 1997 10       X3 -00001 -00278
7  2402594 1997 11       X1 000025 -00242
snipped output

>
> I would like to organize the data so it looks like this:
>
> STN_ID YEAR MM DAY ELEM1 ELEM2
> 1 2402594 1997 9 X1 -00233 -00339
> 2 2402594 1997 9 X2 -00204 000077
> 3 2402594 1997 9 X3 -00119 000030

Where is that second column coming from. I don't see it in the data example
>
> Such that I create a new column named "DAY" that is made up of the
> numbers following "X" in the original data.frame columns. Also, the
> ELEM
> values are converted to columns and parsed with the ELEM code (in this
> case 1 and 2).
>
> I have tried to split apart the columns, transform them, and bind them
> back together, but my ability to do so just isn't there yet. I am
> still
> fairly new to R, and would really appreciate some help in working
> towards organizing this data frame.
>
> Thanks in advance,
> Scott Hatcher
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 24 May 2011 - 19:48:59 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 25 May 2011 - 21:10:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive