[R] Securities earning covariance

From: <ANGELO.LINARDI_at_bancaditalia.it>
Date: Thu, 05 Jun 2008 17:41:44 +0200


Good morning,

I am a new R user and I am trying to learn how to use it. I am trying to solve this problem.
I have a dataframe df of daily securities (for a year) earnings as follows:

SEC_ID		DAY		EARNING
IT0000001	20070101	5.467
IT0000001	20070102	5.456
IT0000001	20070103	4.954
IT0000001	20070104	3.456
	            ..........................
IT0000002	20070101	1.456
IT0000002	20070102	1.345
IT0000002	20070103	1.233

..........................
IT0000003 20070101 0.345 IT0000003 20070102 0.367 IT0000003 20070103 0.319
..........................

And so on: about 800 different SEC_ID and about 180000 rows. I have to calculate the "covariance" for each couple of securities x and y according to the formula:

Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)

being x' and y' the mean of securities earning in the year, N the number of observations, sx and sy the standard deviation of x and y. To do this I could build a df2 data frame like this:

DAY		SEC_ID.x	SEC_ID.y	EARNING.x
EARNING.y	x'	y'	sx	sy
20070101	IT0000001	IT0000002	5.467		1.456
a	b	aa	bb
20070101	IT0000001	IT0000003	5.467		0.345
a	c	aa	cc
20070101	IT0000002	IT0000003	1.456		0.345
b	c	bb	cc
20070102	IT0000001	IT0000002	5.456		1.345
a	b	aa	bb
20070102	IT0000001	IT0000003	5.456		0.367
a	c	aa	cc
20070102	IT0000002	IT0000003	1.345		0.367
b	c	bb	cc
........................................................................
.......................................................

(merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then easily calculate the formula; but the dimensions are too big (the process stops whit an out-of-memory message). Besides partitioning the input and using a loop, are there any smarter solutions (eventually using split and other ways of "subgroup merging" to solve the problem ?
Are there any "shortcuts" using statistical built-in functions (e.g. cov, vcov) ?
Thank you in advance

Angelo Linardi


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 05 Jun 2008 - 18:12:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 Jun 2008 - 19:30:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive