RE: [R] RE: Reading Dates in a csv File

From: Mulholland, Tom <Tom.Mulholland_at_dpi.wa.gov.au>
Date: Tue 08 Feb 2005 - 16:46:59 EST


My first thought was that all it looked a bit complicated for something that should be straightforward.

I created a file called t.txt. I worked out the way I would have done it and then I tested to see which was fastest. One little hiccup is that the two objects are not identical and I though they would be. Of course I could have made a typo somewhere. But then there may be something I have not come across. Guess it's time to see what identical really means.

> system.time({

+ file <- read.csv("t.txt",header=F,
+                     col.names =c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses = c("character",
+                                   "numeric",
+                                   "character",
+                                   "character",
+                                   "numeric")
+ )
+ file$d_field_3 <- as.POSIXct(strptime(file$d_field_3,format="%m/%d/%Y" ))
+ file$d_field_4 <- as.POSIXct(strptime(file$d_field_4,format="%m/%d/%Y %I:%M:%S %p" ))
+  })

[1] 0.00 0.00 0.02 NA NA
>
>
>
> read_file <- function(file,nrows=-1) {
+ 
+    # create temp classes
+    setClass("t_class_",representation("character"))
+    setAs("character", "t_class_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y")))
+ 
+    setClass("t_class2_", representation("character"))
+    setAs("character", "t_class2_", function(from)
+ as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
+ 
+    # read the file
+    file <- read.csv(file,
+                     header=FALSE,
+                     comment.char = "",
+                     nrows=nrows,
+                     as.is=FALSE,
+                     col.names=c("c_field_1",
+                                 "n_field_2",
+                                 "d_field_3",
+                                 "d_field_4",
+                                 "n_field_5"),
+                      colClasses=c("character",
+                                   "numeric",
+                                   "t_class_",
+                                   "t_class2_",
+                                   "numeric")
+                      )
+ 
+    # remove them now that we are done with them
+    removeClass("t_class_")
+    removeClass("t_class2_")
+ 
+    return(file)
+ 
+ }

> system.time(file2 <- read_file("t.txt"))
[1] 0.14 0.00 0.16 NA NA
>
> identical(file, file2)

[1] FALSE
>
> file
  c_field_1 n_field_2  d_field_3           d_field_4 n_field_5
1       MHK     76.53 2004-05-21 2004-05-04 16:00:00        60
2       MHK     76.53 2004-06-21 2004-05-05 16:00:00        60
3       MHK     76.53 2004-07-21 2004-05-06 16:00:00        65
4       MHK     76.53 2004-08-21 2004-05-07 16:00:00        65
5       MHK     76.53 2004-09-21 2004-05-08 16:00:00        70

> file2
c_field_1 n_field_2 d_field_3 d_field_4 n_field_5 1 MHK 76.53 2004-05-21 2004-05-04 16:00:00 60 2 MHK 76.53 2004-06-21 2004-05-05 16:00:00 60 3 MHK 76.53 2004-07-21 2004-05-06 16:00:00 65 4 MHK 76.53 2004-08-21 2004-05-07 16:00:00 65 5 MHK 76.53 2004-09-21 2004-05-08 16:00:00 70

> str(file)
`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK" "MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21" "2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00" "2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07 16:00:00" ...
 $ n_field_5: num  60 60 65 65 70

> str(file2)
`data.frame':   5 obs. of  5 variables:
 $ c_field_1: chr  "MHK" "MHK" "MHK" "MHK" ...
 $ n_field_2: num  76.5 76.5 76.5 76.5 76.5
 $ d_field_3:`POSIXct', format: chr  "2004-05-21" "2004-06-21" "2004-07-21" "2004-08-21" ...
 $ d_field_4:`POSIXct', format: chr  "2004-05-04 16:00:00" "2004-05-05 16:00:00" "2004-05-06 16:00:00" "2004-05-07 16:00:00" ...
 $ n_field_5: num  60 60 65 65 70

>

> -----Original Message-----
> From: Charles and Kimberly Maner [mailto:ckjmaner@carolina.rr.com]
> Sent: Tuesday, 8 February 2005 12:08 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] RE: Reading Dates in a csv File
>
>
>
> Hi all. Thanks for all of your help/suggestions. I found an
> old email in
> the R-help archives, pieced together a couple things and
> arrived at the
> solution below. As an additional followup, I thought I would
> go ahead and
> post it should other readers come across this same situation.
> Here goes..
>
> Raw data:
> MHK,76.53,05/21/2004,5/4/2004 4:00:00 PM,60
> MHK,76.53,06/21/2004,5/5/2004 4:00:00 PM,60
> MHK,76.53,07/21/2004,5/6/2004 4:00:00 PM,65
> MHK,76.53,08/21/2004,5/7/2004 4:00:00 PM,65
> MHK,76.53,09/21/2004,5/8/2004 4:00:00 PM,70
>
> Code:
> read_file <- function(file,nrows=-1) {
>
> # create temp classes
> setClass("t_class_",representation("character"))
> setAs("character", "t_class_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y")))
>
> setClass("t_class2_", representation("character"))
> setAs("character", "t_class2_", function(from)
> as.POSIXct(strptime(from,format="%m/%d/%Y %I:%M:%S %p")))
>
> # read the file
> file <- read.csv(file,
> header=FALSE,
> comment.char = "",
> nrows=nrows,
> as.is=FALSE,
> col.names=c("c_field_1",
> "n_field_2",
> "d_field_3",
> "d_field_4",
> "n_field_5),
> colClasses=c("character",
> "numeric",
> "t_class_",
> "t_class2_",
> "numeric")
> )
>
> # remove them now that we are done with them
> removeClass("t_class_")
> removeClass("t_class2_")
>
> return(file)
>
> }
>
> If any of you folks know a better way and/or have
> comments/enhancements to
> this code, feel free to post/email your critique.
>
>
> Thanks,
> Charles
>
>
>
>
> > _____________________________________________
> > From: Charles and Kimberly Maner
> [mailto:ckjmaner@carolina.rr.com]
> >
> > Sent: Thursday, February 03, 2005 8:35 AM
> > To: 'r-help@stat.math.ethz.ch'
> > Subject: Reading Dates in a csv File
> >
> >
> > Hi all. I'm reading in a flat, comma-delimited flat file
> using read.csv.
> > It works marvelously for the most part. I am using the colClasses
> > argument to, basically, create numeric, factor and
> character classes for
> > the columns I'm reading in. However, a couple of the
> fields in the file
> > are date fields. I'm fully aware that POSIXct can be used
> as a class,
> > however the field must obey, (I think), the standard/default POSIXct
> > format. Hence the following question: Does anyone have a
> method they can
> > share to read in a non-standard formatted date to convert
> to POSIXct? I
> > can read it in then convert it, but that's a two pass
> approach and not as
> > elegant as a single pass through read.csv. I've read, from the
> > documentation, that "[o]therwise there needs to be an as
> method (from
> > package methods) for conversion from "character" to the
> specified formal
> > class" but I do not know and have not figured out how to do that.
> >
> > Any suggestion(s) would be greatly appreciated.
> >
> >
> > Thanks,
> > Charles
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Feb 08 15:58:35 2005

This archive was generated by hypermail 2.1.8 : Tue 08 Feb 2005 - 18:20:44 EST