# Re: [R] Securities earning covariance

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Thu, 05 Jun 2008 11:56:24 -0400

Replace cov(z) with cov(z, use = "pair") is there are missing values as there are here.

On Thu, Jun 5, 2008 at 11:54 AM, Gabor Grothendieck <ggrothendieck_at_gmail.com> wrote:
> Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.
>
>
> Lines <- "SEC_ID DAY EARNING
> IT0000001 20070101 5.467
> IT0000001 20070102 5.456
> IT0000001 20070103 4.954
> IT0000001 20070104 3.456
> IT0000002 20070101 1.456
> IT0000002 20070102 1.345
> IT0000002 20070103 1.233
> IT0000003 20070101 0.345
> IT0000003 20070102 0.367
> IT0000003 20070103 0.319
> "
> DFs <- split(DF, DF\$SEC_ID)
>
> library(zoo)
> f <- function(DF.) zoo(DF.\$EARNING, as.Date(format(DF.\$DAY), "%Y%m%d"))
> z <- do.call(merge, lapply(DFs, f))
> cov(z) # uses n-1
>
>
> On Thu, Jun 5, 2008 at 11:41 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
>> Good morning,
>>
>> I am a new R user and I am trying to learn how to use it.
>> I am trying to solve this problem.
>> I have a dataframe df of daily securities (for a year) earnings as
>> follows:
>>
>> SEC_ID DAY EARNING
>> IT0000001 20070101 5.467
>> IT0000001 20070102 5.456
>> IT0000001 20070103 4.954
>> IT0000001 20070104 3.456
>> ..........................
>> IT0000002 20070101 1.456
>> IT0000002 20070102 1.345
>> IT0000002 20070103 1.233
>> ..........................
>> IT0000003 20070101 0.345
>> IT0000003 20070102 0.367
>> IT0000003 20070103 0.319
>> ..........................
>>
>> And so on: about 800 different SEC_ID and about 180000 rows.
>> I have to calculate the "covariance" for each couple of securities x and
>> y according to the formula:
>>
>> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>>
>> being x' and y' the mean of securities earning in the year, N the number
>> of observations, sx and sy the standard deviation of x and y.
>> To do this I could build a df2 data frame like this:
>>
>> DAY SEC_ID.x SEC_ID.y EARNING.x
>> EARNING.y x' y' sx sy
>> 20070101 IT0000001 IT0000002 5.467 1.456
>> a b aa bb
>> 20070101 IT0000001 IT0000003 5.467 0.345
>> a c aa cc
>> 20070101 IT0000002 IT0000003 1.456 0.345
>> b c bb cc
>> 20070102 IT0000001 IT0000002 5.456 1.345
>> a b aa bb
>> 20070102 IT0000001 IT0000003 5.456 0.367
>> a c aa cc
>> 20070102 IT0000002 IT0000003 1.345 0.367
>> b c bb cc
>> ........................................................................
>> .......................................................
>>
>> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
>> easily calculate the formula; but the dimensions are too big (the
>> process stops whit an out-of-memory message).
>> Besides partitioning the input and using a loop, are there any smarter
>> solutions (eventually using split and other ways of "subgroup merging"
>> to solve the problem ?
>> Are there any "shortcuts" using statistical built-in functions (e.g.
>> cov, vcov) ?
>>
>> Angelo Linardi
>>
>>
>>
