Re: [R] duplicate values

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Sun, 16 Nov 2008 18:43:03 +0000 (GMT)

Is the question 'duplicated next to each other' or 'duplicated anywhere later'? I read it as the latter, so would use

dup <- duplicated(x$dt)

or

dup <- duplicated(x[c("Date", "time")]

Also, be very careful as Date-time values like this can be duplicated and refer to different times on days when DST ends. E.g. there are both

"2008-10-26 02:30:00 CEST"
"2008-10-26 02:30:00 CET" in the timezone of Germany (at least with the names my system gives me in English).

On Sun, 16 Nov 2008, jim holtman wrote:

> This should do it for you:
>
>> x <- read.table(textConnection( "Date time Temperature
> + 1 2008-6-1 00:00:00 5
> + 2 2008-6-1 02:00:00 5
> + 3 2008-6-1 03:00:00 6
> + 4 2008-6-1 03:00:00 0
> + 5 2008-6-1 04:00:00 6
> + 6 2008-6-1 04:00:00 0
> + 7 2008-6-1 05:00:00 7
> + 8 2008-6-1 06:00:00 7"), header=TRUE)
>> closeAllConnections()
>> # create datetime
>> x$dt <- as.POSIXct(paste(x$Date, x$time))
>> # create list of duplicate values next to each other
>> dup <- c(FALSE, diff(x$dt) == 0)
>> # remove
>> x[!dup,]
> Date time Temperature dt
> 1 2008-6-1 00:00:00 5 2008-06-01 00:00:00
> 2 2008-6-1 02:00:00 5 2008-06-01 02:00:00
> 3 2008-6-1 03:00:00 6 2008-06-01 03:00:00
> 5 2008-6-1 04:00:00 6 2008-06-01 04:00:00
> 7 2008-6-1 05:00:00 7 2008-06-01 05:00:00
> 8 2008-6-1 06:00:00 7 2008-06-01 06:00:00
>
>
> On Sun, Nov 16, 2008 at 1:10 PM, Antje Nöthlich <antno_at_web.de> wrote:
>> Hei R Users,
>>
>> i have the following dataframe:
>>
>> Datetime Temperature and many more collumns
>> 1 2008-6-1 00:00:00 5
>> 2 2008-6-1 02:00:00 5
>> 3 2008-6-1 03:00:00 6
>> 4 2008-6-1 03:00:00 0
>> 5 2008-6-1 04:00:00 6
>> 6 2008-6-1 04:00:00 0
>> 7 2008-6-1 05:00:00 7
>> 8 2008-6-1 06:00:00 7
>> . . .
>> . . .
>> . . .
>> 3000 2008-8-31 00:00:00 3
>>
>>
>> the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column.
>> Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row.
>> I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other.
>> thanks in advance for your help!
>>
>> Antje
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Sun 16 Nov 2008 - 18:45:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 16 Nov 2008 - 20:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive