Re: [R] extract date

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Tue 05 Apr 2005 - 21:40:55 EST

I just started using gmail and one thing that I thought would be annoying but sometimes is actually interesting are the ads at the right hand side. They are keyed off the content of the email and in the case of your post produced:

http://www.visibone.com/regular-expressions/?via=google120

http://www.regexpbuddy.com

The first one is advertising a javascript reference card (which I happen to own and is excellent); but in any case, the contents of the regexp part of the reference card are fully reproduced on the web page and includes dozens of examples of regexps that you could try. I haven't explored the other web site.

Although I have not read it, there is a book called Mastering Regular Expressions.

By the way, here is an alternative to calculating nd in Prof. Riley's post just to give you something else to play with. I think I prefer his solution but this one is arguably a bit simpler. The three portions separated by the two bars are each deleted if they are present. gsub causes it to repeatedly try them so that it does not stop after deleting the first one:

nd <- gsub("Date: |.*, | ..:.*$", "", dates)

On Apr 5, 2005 7:22 AM, Petr Pikal <petr.pikal@precheza.cz> wrote:
> Dear Prof.Ripley
>
> Thank you for your answer. After some tests and errors I finished
> with suitable extraction function which gives me substatnial
> increase in positive answers.
>
> Nevertheless I definitely need to gain more practice in regular
> expressions, but from the help page I can grasp only easy things. Is
> there any "Regular expressions for dummies" available?
>
> Best regards
> Petr Pikal
>
> On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:
>
> > On Tue, 5 Apr 2005, Petr Pikal wrote:
> >
> > > Dear all,
> > >
> > > please, is there any possibility how to extract a date from data
> > > which are like this:
> >
> > Yes, if you delimit all the possibilities.
> >
> > > ....
> > > "Date: Sat, 21 Feb 04 10:25:43 GMT"
> > > "Date: 13 Feb 2004 13:54:22 -0600"
> > > "Date: Fri, 20 Feb 2004 17:00:48 +0000"
> > > "Date: Fri, 14 Jun 2002 16:22:27 -0400"
> > > "Date: Wed, 18 Feb 2004 08:53:56 -0500"
> > > "Date: 20 Feb 2004 02:18:58 -0600"
> > > "Date: Sun, 15 Feb 2004 16:01:19 +0800"
> > > ....
> > >
> > > I used
> > >
> > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22),
> > > sep="-"), format="%d-%b-%Y")
> > >
> > > which suits to lines 3:5 and 7 (such are the most common in my
> > > dataset) but obviously does not work with other lines.
> >
> > For those examples, in character vector 'dates' (without quotes):
> >
> > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*",
> > "\\1 \\2 \\3", dates)
> > > strptime(nd, "%d %b %y")
> > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18"
> > [6] "2020-02-20" "2020-02-15"
> >
> > You should be able to amend the regexp for a wider range of forms, but
> > your first line is ambiguous (2004 or 2021?) so there are limits.
> >
> > > If there is no stightforward solution I can live with what I use now
> > > but some automagical function like
> > >
> > > give.me.date.from.my.string.regardles.of.formating(x)
> > > would be great.
> >
> > It would be impossible: when Americans write 07/04/2004 they do not
> > mean April 7th.
> >
> > --
> > Brian D. Ripley, ripley@stats.ox.ac.uk
> > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford, Tel: +44 1865 272861 (self) 1 South
> > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG,
> > UK Fax: +44 1865 272595
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
>
> Petr Pikal
> petr.pikal@precheza.cz
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Apr 05 21:52:39 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:01 EST