Re: [R] Data Frame housekeeping

From: Scott Hatcher <scott.v.hatcher_at_gmail.com>
Date: Wed, 25 May 2011 14:46:48 -0230

Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for providing something I hoped would be produced from joining this mailing list: a means of discovering incredibly useful packages such as the "reshape2" one you have introduced me too.

I have a follow up question to your solution (which should produce exactly what I need):

when I run the cast function to reassemble the data frame I get:

Error in names(data) <- array_names(res$labels[[2]]) :

   'names' attribute [7] must be the same length as the vector [1]

This signaled to me that the function was returning 7 values where it expected only 1. To test this I applied a summary function "mean" to the cast, and the result processed (however it only produced NA's because my values were class:factors). What I don't understand is where these multiple values are coming from; there should be only a single value corresponding to the 4 id.vars given in the cast function (STN_ID,YEAR,MM,variable).

Thanks again for your help,

Scott Hatcher

On 24/05/2011 5:16 PM, David Winsemius wrote:
>
> On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:
>
>> Hello,
>>
>> I have a large data frame that is organized by date in a peculiar way. I
>> am seeking advice on how to transform the data into a format that is of
>> more use to me.
>>
>> The data is organized as follows:
>>
>> STN_ID YEAR MM ELEM X1 X2 X3 X4
>> X5 X6 X7
>> 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190 -00251
>> -00243 -00249
>> 2 2402594 1997 10 1 -00003 -00005 -00001 -00039
>> -00031 -00036 -00033
>> 3 2402594 1997 11 1 000025 000065 000070 000069
>> 000115 000072 000093
>>
>> Where "MM" is the month of the year, and ELEM is the variable to which
>> the values in the X* columns describe (in the actual data there are 31 X
>> columns, one for each day of the month). The values in bold are the
>> values that are transferred into the small chart below (which is the
>> result I hope to get). This is to give a sense of how the data is picked
>> out of the original data frame.
>
> assuming this dataframe is named 'tst':
>
> require(reshape2)
> mtst <- melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3
> str(mtst)
> #----------
> 'data.frame': 54 obs. of 6 variables:
> $ STN_ID : num 2402594 2402594 2402594 2402594 2402594 ...
> $ YEAR : num 1997 1997 1997 1997 1998 ...
> $ MM : num 9 10 11 12 1 2 3 4 5 9 ...
> $ ELEM : num 1 1 1 1 1 1 1 1 1 2 ...
> $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
> $ value : chr "-00233" "-00003" "000025" "000160" ...
>
> dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM)
> #---------
> STN_ID YEAR MM variable 1 2
> 1 2402594 1997 9 X1 -00233 -00339
> 2 2402594 1997 9 X2 -00204 -00339
> 3 2402594 1997 9 X3 -00119 -00343
> 4 2402594 1997 10 X1 -00003 -00207
> 5 2402594 1997 10 X2 -00005 -00289
> 6 2402594 1997 10 X3 -00001 -00278
> 7 2402594 1997 11 X1 000025 -00242
> snipped output
>
>>
>> I would like to organize the data so it looks like this:
>>
>> STN_ID YEAR MM DAY ELEM1 ELEM2
>> 1 2402594 1997 9 X1 -00233 -00339
>> 2 2402594 1997 9 X2 -00204 000077
>> 3 2402594 1997 9 X3 -00119 000030
>
> Where is that second column coming from. I don't see it in the data
> example
>>
>> Such that I create a new column named "DAY" that is made up of the
>> numbers following "X" in the original data.frame columns. Also, the ELEM
>> values are converted to columns and parsed with the ELEM code (in this
>> case 1 and 2).
>>
>> I have tried to split apart the columns, transform them, and bind them
>> back together, but my ability to do so just isn't there yet. I am still
>> fairly new to R, and would really appreciate some help in working
>> towards organizing this data frame.
>>
>> Thanks in advance,
>> Scott Hatcher
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 25 May 2011 - 21:04:51 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 25 May 2011 - 21:10:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive