From: <ANGELO.LINARDI_at_bancaditalia.it>

Date: Fri, 06 Jun 2008 09:35:36 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 10:08:27 GMT

Date: Fri, 06 Jun 2008 09:35:36 +0200

It works perfectly, thank you so much.

Now I will try to put teh results into a suitable form (a data frame like this):

SEC_ID.x SEC_ID.y EARN_COV

Thank you again

Angelo Linardi

-----Messaggio originale-----

Da: Patrick Burns [mailto:pburns_at_pburns.seanet.com]
Inviato: giovedě 5 giugno 2008 18.11

**A: LINARDI ANGELO
**

Cc: r-help_at_r-project.org

Oggetto: Re: [R] Securities earning covariance

I would start by creating a matrix that held the returns with rows being the dates and columns being the securities. You can do this by something along the lines of:

days <- as.character(df[, 'DAY'])

sec <- as.character(df[, 'SEC_ID']

earningmat <- array(NA, c(length(unique(days)),

length(unique(sec))), list(sort(unique(days)),
unique(sec)))

submat <- cbind(match(days, rownames(earningmat)),

match(sec, colnames(earningmat)))

earningmat[submat] <- as.numeric(as.character(df[, 'EARNING']))

Notice that while the 'as.numeric-as.character' in the last line may not be needed -- if it is needed, it is needed in a big way. If the 'EARNING' column is a factor (because there was at least one item that didn't appear to be numeric when it was read in), then skipping the 'as.numeric-as.character' call will put the codes for the factor into the matrix. It will be numeric as you expect, but complete garbage.

The trick with 'submat' is explained in any complete description of subscripting -- the subscripting section of Chapter 1 of S Poetry, for instance.

Once you have a suitable matrix, then you can use 'var' or some other function to get the variance matrix. Depending on where you are going, a factor model variance may be better. You can get 'factor.model.stat' from the public domain area of the Burns Statistics website. This is especially useful if there are missing values in your matrix.

Patrick Burns

patrick_at_burns-stat.com

+44 (0)20 8525 0696

http://www.burns-stat.com

(home of S Poetry and "A Guide for the Unwilling S User")

ANGELO.LINARDI_at_bancaditalia.it wrote:

> Good morning,

*>
**> I am a new R user and I am trying to learn how to use it.
**> I am trying to solve this problem.
**> I have a dataframe df of daily securities (for a year) earnings as
**> follows:
**>
**> SEC_ID DAY EARNING
**> IT0000001 20070101 5.467
**> IT0000001 20070102 5.456
**> IT0000001 20070103 4.954
**> IT0000001 20070104 3.456
**> ..........................
**> IT0000002 20070101 1.456
**> IT0000002 20070102 1.345
**> IT0000002 20070103 1.233
**> ..........................
**> IT0000003 20070101 0.345
**> IT0000003 20070102 0.367
**> IT0000003 20070103 0.319
**> ..........................
**>
**> And so on: about 800 different SEC_ID and about 180000 rows.
**> I have to calculate the "covariance" for each couple of securities x
**> and y according to the formula:
**>
**> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
**>
**> being x' and y' the mean of securities earning in the year, N the
**> number of observations, sx and sy the standard deviation of x and y.
**> To do this I could build a df2 data frame like this:
**>
**> DAY SEC_ID.x SEC_ID.y EARNING.x
**> EARNING.y x' y' sx sy
**> 20070101 IT0000001 IT0000002 5.467 1.456
**> a b aa bb
**> 20070101 IT0000001 IT0000003 5.467 0.345
**> a c aa cc
**> 20070101 IT0000002 IT0000003 1.456 0.345
**> b c bb cc
**> 20070102 IT0000001 IT0000002 5.456 1.345
**> a b aa bb
**> 20070102 IT0000001 IT0000003 5.456 0.367
**> a c aa cc
**> 20070102 IT0000002 IT0000003 1.345 0.367
**> b c bb cc
**> ........................................................................
**> .......................................................
**>
**> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
**> easily calculate the formula; but the dimensions are too big (the
**> process stops whit an out-of-memory message).
**> Besides partitioning the input and using a loop, are there any smarter
**> solutions (eventually using split and other ways of "subgroup merging"
**> to solve the problem ?
**> Are there any "shortcuts" using statistical built-in functions (e.g.
**> cov, vcov) ?
**> Thank you in advance
**>
**> Angelo Linardi
**>
**>
**>
**> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
**> fede e non comportano alcun vincolo ne' creano obblighi per la Banca
**> stessa, salvo che cio' non sia espressamente previsto da un accordo scritto.
**> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore,
**> La preghiamo di comunicarne via e-mail la ricezione al mittente e di
**> distruggerne il contenuto. La informiamo inoltre che l'utilizzo non
**> autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione.
**> -- E-mails from the Bank of Italy are sent in good faith but they are
**> neither binding on the Bank nor to be understood as creating any
**> obligation on its part except where provided for in a written
**> agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and delete it from your system.
**> Please also note that the unauthorized disclosure or use of the
**> message or any attachments could be an offence. Thank you for your
**> cooperation. **
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
**>
**>
*

- Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
the Bank nor to be understood as creating any obligation on its part except where
provided for in a written agreement. This e-mail is confidential. If you have received it
by mistake, please inform the sender by reply e-mail and delete it from your system.
Please also note that the unauthorized disclosure or use of the message or any
attachments could be an offence. Thank you for your cooperation. **

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 10:08:27 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 10:30:36 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*