Re: [R] Conditional Sums for Index creation

From: Patnaik, Tirthankar <tirthankar.patnaik_at_citi.com>
Date: Tue, 15 May 2007 14:54:32 +0530


All,
 Happy to say that the problem could be solved. The key idea was from Patrick Burns (Convert the data-frame to a matrix!). As written earlier, the steps were to first get a object (call it ad) containing the non-missing entries at each row. Then run a sum over each row, selecting only those columns that are pointed to in the object ad.  

Another quick step was to use the fact that if there's an NA in any column (after a stock has entered the portfolio), then we could use the previous value just as well (e.g., the market cap of a stock). So na.locf was a big help!  

Just providing the code here for illustration purposes:  

> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac

   V1 V2 V3 V4 V5
1 27 93 82 NA NA
2 37 21 65 NA NA
3 57 65 78 NA NA
4 91 13 55 NA NA
5 20 27 53 NA NA
6 90 39 79 26 NA
7 94 NA NA 48 NA
8 66 38 48 77 NA
9 63 87 73 8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60
>

> # -----------------------------------------------------
> # Indexes of all the non-missing in all the rows
> ad <- apply(ac,1,function(y)which(!is.na(y)))
>
> af <- data.matrix(na.locf(ac,na.rm=FALSE))
> # Include another column as placeholder
> ag <- cbind(af,rep(1,nrow(af)))
> # Call it, "sumCorr".
> colnames(ag)[6] <- "sumCorr"
> ag[1,6] <- sum(ag[1,],na.rm=TRUE)
> for (r in 2:nrow(ag)){

+ sumCorr <- unlist(sum(ag[r,unlist(ad[r-1])],na.rm=TRUE))
+ ag[r,6] <- sumCorr
+ }

> ag

   V1 V2 V3 V4 V5 sumCorr

1  27 93 82 NA NA     203
2  37 21 65 NA NA     123
3  57 65 78 NA NA     200
4  91 13 55 NA NA     159
5  20 27 53 NA NA     100
6  90 39 79 26 NA     208
7  94 39 79 48 NA     260
8  66 38 48 77 NA     143
9  63 87 73  8 NA     231
10 63 34 69 88 NA     254
11 21 48 48 34 24     130
12 18 60 86 84 24     272
13 69 49 44 35 64     197
14 38 19 24 33 88     202
15 77 83 24 48 78     310
16 50 67 10 89 80     286
17 72 79 32 86 46     315
18 99 11 52 39 41     242
19 38 72 66 78 81     335
20 78 41 41 96 60     316

>

Gaurav,
Anything to not implement a double for-loop! :) With the implementation, I was able to generate my index at last. Perhaps it's because I'm quite new with R, but I find it quite arcane sometimes! :)    

best,
-Tir  

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887


From: gyadav_at_ccilindia.co.in [mailto:gyadav_at_ccilindia.co.in] Sent: Monday, May 14, 2007 6:29 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: r-help_at_stat.math.ethz.ch; r-help-bounces_at_stat.math.ethz.ch Subject: Re: [R] Conditional Sums for Index creation

Hi Tirthankar

this will help you

ind is a matrix which indicates the start of any new stock. ind[i,j] means that in j + 1 column all the values from 1st row to i - 1 row are all NAs.

> x

      V2 V3 V4 V5 V6

 [1,] 27 93 82 NA NA 
 [2,] 37 21 65 NA NA 
 [3,] 57 65 78 NA NA 
 [4,] 91 13 55 NA NA 
 [5,] 20 27 53 NA NA 
 [6,] 90 39 79 26 NA 
 [7,] 94 NA NA 48 NA 
 [8,] 66 38 48 77 NA 
 [9,] 63 87 73  8 NA 
[10,] NA 34 69 88 NA 
[11,] 21 48 48 34 24 
[12,] 18 60 86 84 NA 
[13,] 69 49 44 35 64 
[14,] 38 19 24 33 88 
[15,] 77 83 NA 48 78 
[16,] 50 67 10 89 80 
[17,] 72 79 32 86 46 
[18,] 99 11 52 39 41 

[19,] 38 72 66 78 81
[20,] 78 41 41 96 60
>
> for ( j in 1:length(x[1,]) - 1) {
+ for ( i in 2:length(x[,1])) { 
+ indicator<-TRUE 
+ for (k in 1: i - 1){ 
+ indicator <- indicator && is.na(x[k,j+1]) 
+ } 
+ ind[i,j]<-indicator 
+ 
+ } 
+ } 

> ind
V2 V3 V4 V5 V6 [1,] NA NA NA NA NA [2,] 0 0 NA NA 0 [3,] 0 0 NA NA 0 [4,] 0 0 NA NA 0 [5,] 0 0 NA NA 0 [6,] 0 0 NA NA 0 [7,] 0 0 0 NA 0 [8,] 0 0 0 NA 0 [9,] 0 0 0 NA 0 [10,] 0 0 0 NA 0 [11,] 0 0 0 NA 0 [12,] 0 0 0 0 0 [13,] 0 0 0 0 0 [14,] 0 0 0 0 0 [15,] 0 0 0 0 0 [16,] 0 0 0 0 0 [17,] 0 0 0 0 0 [18,] 0 0 0 0 0

[19,] 0 0 0 0 0
[20,] 0 0 0 0 0
>

Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: mailtogauravyadav_at_gmail.com Bhagavad Gita: Man is made by his Belief, as He believes, so He is

"Patnaik, Tirthankar " <tirthankar.patnaik_at_citi.com> Sent by: r-help-bounces_at_stat.math.ethz.ch

05/14/2007 11:53 AM

To
<r-help_at_stat.math.ethz.ch>
cc
Subject
[R] Conditional Sums for Index creation         

Hi,

                Apologies for the long mail. I have a data.frame with
columns of
price/mcap data for a portfolio of stocks, and the date. To get the total value of the portfolio on a daily basis, I calculate rowSums of the data.frame.

> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac
  V1 V2 V3 V4 V5
1 27 93 82 NA NA
2 37 21 65 NA NA
3 57 65 78 NA NA
4 91 13 55 NA NA
5 20 27 53 NA NA
6 90 39 79 26 NA
7 94 NA NA 48 NA
8 66 38 48 77 NA
9 63 87 73 8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60
>

Here the rows 1:20 are dates (also in my data.frame).

Since some of the prices have NA, the rowSums is made to ignore these entries.

> rowSums(ac,na.rm=TRUE)
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242 335 316
>

Stocks are being added to the portfolio too. So from date=6 (or row=6) we have the 4th stock V4, and from date=11, we have the 5th stock V5. My problem is that I need to calculate the rowSums for row=6 (When a new stock was added), _with_ and _without_ the new stock. So my answer for row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the original set of stocks (without V4). Similarly, my answer for row=11 would be 175 for the plain sum, and 175 - 24 = 151 for the original sum (without V5).

Basically I'm interested in finding out the value of the portfolio with and without the new stock for the purposes of creating an index. It's possible that some stocks my get dropped later, in which case there would be an NA series starting for say V1 at row=18 and so on. In that case, the aim would be to find the sum at row=18 with and without the value of V1.

Is there any way I can get the sum over columns, deleting specific colums? To get the columns that are NA in any row, I tried (shown for the first 12 rows):

> apply(ac[1:12,],1,function(y)which(is.na(y)))

Which correctly gives

$`1`
V4 V5
4 5

$`2`
V4 V5
4 5

$`3`
V4 V5
4 5

$`4`
V4 V5
4 5

$`5`
V4 V5
4 5

$`6`
V5
5

$`7`
V2 V3 V5
2 3 5

$`8`
V5
5

$`9`
V5
5

$`10`
V1 V5
1 5

$`11`
integer(0)

$`12`
V5
5

>

But now I'm stuck. I don't how to use this list of indices at each row to exclude my columns.

Any pointers please? Would such an exercise be easier if I use a time-series based object, like a zoo.

TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.


DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 May 2007 - 14:15:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 May 2007 - 15:02:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.