# Re: [R] Securities earning covariance

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Thu, 05 Jun 2008 11:54:59 -0400

Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.

```Lines <- "SEC_ID          DAY             EARNING
IT0000001       20070101        5.467
IT0000001       20070102        5.456
IT0000001       20070103        4.954
IT0000001       20070104        3.456
IT0000002       20070101        1.456
IT0000002       20070102        1.345
IT0000002       20070103        1.233
IT0000003       20070101        0.345
IT0000003       20070102        0.367
IT0000003       20070103        0.319
```

"

library(zoo)
f <- function(DF.) zoo(DF.\$EARNING, as.Date(format(DF.\$DAY), "%Y%m%d")) z <- do.call(merge, lapply(DFs, f))
cov(z) # uses n-1

On Thu, Jun 5, 2008 at 11:41 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
> Good morning,
>
> I am a new R user and I am trying to learn how to use it.
> I am trying to solve this problem.
> I have a dataframe df of daily securities (for a year) earnings as
> follows:
>
> SEC_ID DAY EARNING
> IT0000001 20070101 5.467
> IT0000001 20070102 5.456
> IT0000001 20070103 4.954
> IT0000001 20070104 3.456
> ..........................
> IT0000002 20070101 1.456
> IT0000002 20070102 1.345
> IT0000002 20070103 1.233
> ..........................
> IT0000003 20070101 0.345
> IT0000003 20070102 0.367
> IT0000003 20070103 0.319
> ..........................
>
> And so on: about 800 different SEC_ID and about 180000 rows.
> I have to calculate the "covariance" for each couple of securities x and
> y according to the formula:
>
> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>
> being x' and y' the mean of securities earning in the year, N the number
> of observations, sx and sy the standard deviation of x and y.
> To do this I could build a df2 data frame like this:
>
> DAY SEC_ID.x SEC_ID.y EARNING.x
> EARNING.y x' y' sx sy
> 20070101 IT0000001 IT0000002 5.467 1.456
> a b aa bb
> 20070101 IT0000001 IT0000003 5.467 0.345
> a c aa cc
> 20070101 IT0000002 IT0000003 1.456 0.345
> b c bb cc
> 20070102 IT0000001 IT0000002 5.456 1.345
> a b aa bb
> 20070102 IT0000001 IT0000003 5.456 0.367
> a c aa cc
> 20070102 IT0000002 IT0000003 1.345 0.367
> b c bb cc
> ........................................................................
> .......................................................
>
> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
> easily calculate the formula; but the dimensions are too big (the
> process stops whit an out-of-memory message).
> Besides partitioning the input and using a loop, are there any smarter
> solutions (eventually using split and other ways of "subgroup merging"
> to solve the problem ?
> Are there any "shortcuts" using statistical built-in functions (e.g.
> cov, vcov) ?
>
> Angelo Linardi
>
>
>
>
