Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

From: Peter Ehlers <>
Date: Mon, 25 Apr 2011 11:11:54 -0700

On 2011-04-25 10:19, Christoph Jäckel wrote:
> Hi Together,
> I have a problem with the plyr package - more precisely with the ddply
> function - and would be very grateful for any help. I hope the example
> here is precise enough for someone to identify the problem. Basically,
> in this step I want to identify observations that are identical in
> terms of certain identifiers (ID1, ID2, ID3) and just want to save
> those observations (in this step, without deleting any rows or
> manipulating any data) in a separate data.frame. However, I get the
> warning message below and the column with dates is messed up.
> Interestingly, the value column (the type is factor here, but if you
> change that with as.integer it doesn't make any difference) is handled
> correctly. Any idea what I do wrong?
> df<- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"),
> Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-05-5","1985-05-6","1985-05-7"),
> Value=c(1,2,3,4,5,6,7)))
> df[,1]<- as.character(df[,1])
> df[,2]<- as.character(df[,2])
> df$Date<- strptime(df$Date,"%Y-%m-%d")
> #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
> ddply(df,.(ID1,ID2,ID3),nrow)
> #I want to save those IDs in a separate data.frame, so the desired output is:
> df[c(2:3,6:7),]
> #My idea: Write a custom function that only returns observations with
> multiple rows.
> #Seems to work except that the Date column doesn't make any sense anymore
> #Warning message: In output[[var]][rng]<- df[[var]]: number of items
> to replace is not a multiple of replacement length
> ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})
> #Notice that it works perfectly if I only have one observation with
> multiple rows
> ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df})

I would characterize your problem as:
a) using strptime - this is what gives ddply() fits;

b) not using str() to check whether R agrees with

    you with respect to your data;

c) using cbind() inside data.frame(). This isn't

    wrong, but is rarely (in my experience) useful.

If you use as.Date (or even nothing) on your Date variable, you'll find that ddply does what you want. To see why it doesn't work with strptime, check str(df) and then ?Posixlt. You've converted Date values to lists.

My comment about cbind() is to warn you that your Values variable, as you have constructed it, is a factor.

Peter Ehlers

> Thanks in advance,
> Christoph
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Christoph Jäckel (Dipl.-Kfm.)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Research Assistant
> Chair for Financial Management and Capital Markets | Lehrstuhls für
> Finanzmanagement und Kapitalmärkte
> TUM School of Management | Technische Universität München
> Arcisstr. 21 | D-80333 München | Germany
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code. mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon 25 Apr 2011 - 18:14:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Apr 2011 - 18:20:35 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive