Re: [R] Securities earning covariance

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Thu, 05 Jun 2008 11:56:24 -0400

Replace cov(z) with cov(z, use = "pair") is there are missing values as there are here.

On Thu, Jun 5, 2008 at 11:54 AM, Gabor Grothendieck <ggrothendieck_at_gmail.com> wrote:
> Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.
>
>
> Lines <- "SEC_ID DAY EARNING
> IT0000001 20070101 5.467
> IT0000001 20070102 5.456
> IT0000001 20070103 4.954
> IT0000001 20070104 3.456
> IT0000002 20070101 1.456
> IT0000002 20070102 1.345
> IT0000002 20070103 1.233
> IT0000003 20070101 0.345
> IT0000003 20070102 0.367
> IT0000003 20070103 0.319
> "
> DF <- read.table(textConnection(Lines), header = TRUE)
> DFs <- split(DF, DF$SEC_ID)
>
> library(zoo)
> f <- function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), "%Y%m%d"))
> z <- do.call(merge, lapply(DFs, f))
> cov(z) # uses n-1
>
>
> On Thu, Jun 5, 2008 at 11:41 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
>> Good morning,
>>
>> I am a new R user and I am trying to learn how to use it.
>> I am trying to solve this problem.
>> I have a dataframe df of daily securities (for a year) earnings as
>> follows:
>>
>> SEC_ID DAY EARNING
>> IT0000001 20070101 5.467
>> IT0000001 20070102 5.456
>> IT0000001 20070103 4.954
>> IT0000001 20070104 3.456
>> ..........................
>> IT0000002 20070101 1.456
>> IT0000002 20070102 1.345
>> IT0000002 20070103 1.233
>> ..........................
>> IT0000003 20070101 0.345
>> IT0000003 20070102 0.367
>> IT0000003 20070103 0.319
>> ..........................
>>
>> And so on: about 800 different SEC_ID and about 180000 rows.
>> I have to calculate the "covariance" for each couple of securities x and
>> y according to the formula:
>>
>> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>>
>> being x' and y' the mean of securities earning in the year, N the number
>> of observations, sx and sy the standard deviation of x and y.
>> To do this I could build a df2 data frame like this:
>>
>> DAY SEC_ID.x SEC_ID.y EARNING.x
>> EARNING.y x' y' sx sy
>> 20070101 IT0000001 IT0000002 5.467 1.456
>> a b aa bb
>> 20070101 IT0000001 IT0000003 5.467 0.345
>> a c aa cc
>> 20070101 IT0000002 IT0000003 1.456 0.345
>> b c bb cc
>> 20070102 IT0000001 IT0000002 5.456 1.345
>> a b aa bb
>> 20070102 IT0000001 IT0000003 5.456 0.367
>> a c aa cc
>> 20070102 IT0000002 IT0000003 1.345 0.367
>> b c bb cc
>> ........................................................................
>> .......................................................
>>
>> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
>> easily calculate the formula; but the dimensions are too big (the
>> process stops whit an out-of-memory message).
>> Besides partitioning the input and using a loop, are there any smarter
>> solutions (eventually using split and other ways of "subgroup merging"
>> to solve the problem ?
>> Are there any "shortcuts" using statistical built-in functions (e.g.
>> cov, vcov) ?
>> Thank you in advance
>>
>> Angelo Linardi
>>
>>
>>
>> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
>> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
>> sia espressamente previsto da un accordo scritto.
>> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
>> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
>> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
>> potrebbe costituire reato. Grazie per la collaborazione.
>> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
>> the Bank nor to be understood as creating any obligation on its part except where
>> provided for in a written agreement. This e-mail is confidential. If you have received it
>> by mistake, please inform the sender by reply e-mail and delete it from your system.
>> Please also note that the unauthorized disclosure or use of the message or any
>> attachments could be an offence. Thank you for your cooperation. **
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 05 Jun 2008 - 18:31:14 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 Jun 2008 - 19:30:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive