[R] lapply or data.table to find a unit's previous transaction

From: William Rogers <whrogers73_at_gmail.com>
Date: Wed, 02 Jun 2010 21:29:56 -0500


I have a dataset of property transactions that includes the transaction ID (TranID), property ID (UnitID), and transaction date (TranDt). I need to create a data frame (or data table) that includes the previous transaction date, if one exists. This is an easy problem in SQL, where I just run a sub-query, but I'm trying to make R my one-stop-shopping program.  The following code works on a subset of my data, but I can't run this on my full dataset because my computer runs out of memory after about 30 minutes. (Using a 32-bit machine.)
Use the following synthetic data for example.

n<- 100
TranID<- lapply(n:(2*n), function(x) (
as.matrix(paste(x, sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), "days"), sample(1:5, 1)), sep= "D"), ncol= 1)))

TranID<- do.call("rbind", TranID)
UnitID<- substr(TranID, 1, nchar(n))
TranDt<- substr(TranID, nchar(n)+2, nchar(n)+11)
Data<- data.frame(TranID= TranID, UnitID= UnitID, TranDt= as.Date(TranDt))

#First I create a list of all the previous transactions by unit

TranList<- as.matrix(Data$TranID, ncol= 1) PreTran<- lapply(TranList,

  function(x) (with(Data,
  Data[
  UnitID== substr(x, 1, nchar(n))&
  TranDt< Data[TranID== x, "TranDt"], ]
  ))
  )

#I do get warnings about missing data because some transactions have
no predecessor.
#Some transactions have no previous transactions, others have many so
I pick the most recent

BeforeTran<- lapply(seq_along(PreTran), function(x) ( with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))

#I need to add the current transaction's TranID to the list so I can merge later

BeforeTran<- lapply(seq_along(PreTran), function(x) ( transform(BeforeTran[[x]], TranID= TranList[x, 1])))

#Finally, I convert from a list to a data frame

BeforeTran<- do.call("rbind", BeforeTran)

#I have used a combination of data.table and for loops, but that seems
cheesey and doesn't preform much better.

library(data.table)

#First I create a list of all the previous transactions by unit

TranList2<- vector(nrow(Data), mode= "list") names(TranList2)<- levels(Data$TranID)
DataDT<- data.table(Data)

#Use a for loop and data.table to find the date of the previous transaction

for (i in levels(Data$TranID)) {
if (DataDT[UnitID== substr(i, 1, nchar(n))&    TranDt<= (DataDT[TranID== i, TranDt]), length(TranDt)]> 1)
TranList2[[i]]<- cbind(TranID= i,
DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt< (DataDT[TranID== i, TranDt]),
list(TranDt= max(TranDt))])
}

#Finally, I convert from a list to a data table

BeforeTran2<- do.call("rbind", TranList2)

#My intution says that this code doesn't take advantage of
data.table's attributes.
#Are there any ideas out there?  Thank you.
#P.S. I've tried plyr and it does not help my memory problem.

--
William H. Rogers

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Jun 2010 - 03:21:18 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Jun 2010 - 10:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive