Re: [R] Manipulating DataSets

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Thu, 29 May 2008 22:08:48 +0200

Neil Gupta wrote:
> Hello R-Users,
>
> I am new to R and trying my best however I need help with this simple task.
> I have a dataset, YM1207.
> X.Symbol Date Time Exchange TickType
> ReferenceNumber Price Size
> 12491 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85985770 13379 7
> 12492 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85985771 13380 4
> 12493 3:YMZ7.EC 12/03/2007 08:32:50 EC T
> 85985845 13379 1
> 12494 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85985846 13379 7
> 12495 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85985847 13380 4
> 12496 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85986222 13379 6
> 12497 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85986223 13380 4
>
> I want to insert a column called NPrice which takes a pair of B,A and
> calculates its average Price. And than input that number in the B row and A
> row in the new column NPrice. Each B, A is seperated by +1 on the Reference
> Number. I want to skip T entries. T's do not come inbetween corresponding Bs
> and As. The other columns are not of interest. I would really appreciate it
> if I can get some help on this or refer me to a source that may.
>
>
I think this is a case where what you really need to do is to become aware of the tools you have in the toolbox. E.g., I already showed you one way to do it if the T's were absent:

N <- nrow(YM1207)
ix <- gl(N/2,2)
YM1207$NPrice <- ave(YM1207$price, ix)

(OK, I forgot $price last time...)

so how about making them disappear using

isAB <- YM1207$TickType %in% c("A","B)]
ABprice <- YM1207$price[ix]

then do as before

N <- length(ABprice)
ix <- gl(N/2,2)
NPrice <- ave(YM1207$price, ix)

and put it back using

YM1207$NPrice <- NA
YM1207$NPrice[isAB] <- NPrice

There are several ways to do this sort of thing. Another variation, closer to your original suggestion would be to do

isA <- YM1207$TickType == "A"
isB <- YM1207$TickType == "B"
nPrice <- (YM1207$price[isA]+YM1207$price[isB])/2 YM1207$NPrice <- NA
YM1207$NPrice[isA] <- YM1207$NPrice[isB] <- nPrice

(you probably don't really need the NA assignment, but strange things can happen when you make subassignments into non-existing columns)

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 29 May 2008 - 20:22:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 21:31:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive