# Re: [R] Mixture of Distributions

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Tue, 04 Mar 2008 04:20:36 +0000 (UTC)

(Ted Harding) <Ted.Harding_at_manchester.ac.uk> wrote in news:XFMail.080221222007.Ted.Harding_at_manchester.ac.uk:

>
> On 21-Feb-08 21:39:56, Ted Harding wrote:

```>> On 21-Feb-08 20:58:25, Evgenia wrote:
>>>
>>> Dear R users,
>>> I would like to sample from a mixture distribution
>>> p1*f1+p2*f2+p3*f3 with f1,f2,f3 three different forms
>>> of distributions. I know that in the case of two
>>> distributions I have to sample the mixture compoment
>>> membership.
>>>
>>> How can I weight my three distributions with their
>>> respective probabilities?
>>>
>>> Evgenia
>>
>> There are several ways in which you could write code
>> to do this, but all amount to the fact that each value
>> sampled from such a mixture is obtained by
>> a) choose between f1, f2, f3 at random according to the
>>    probabilities p1, p2, p3 (it is assumed p1+p2+p3=1).
>> b) sample 1 value from whichever of f1, f2, f3 was chosen.
>>
>> Suggestion:
>>
>> Suppose the functions rf1(n), rf2(n), rf3(n) respectively
>> sample n values from f1, f2 and f3.
```

>
> S0 <- cbind(rf1(n),rf2(n),rf3(n))
> ix <- sample(c(1,2,3),n,3,prob=c(p1,p2,p3),replace=TRUE)
>
> So far so good!
>
> S <- S0[,ix]
>
> But this will not do what I intended (i.e. select element ix[1]
> from the first row os S0, ix[2] from the second row, and so on).
>
> Instead, it will return an nxn matrix with n rows, and n columns
> which are copies of columns of S0 in the order selected by ix!
>
> The following will do the correct thing, though there must
> be a neater way of doing it! The example below has been
> corrected.
>
> S <- S0[,1]*(ix==1) + S0[,2]*(ix==2) + S0[,3]*(ix==3)

I discovered a shorter method (and I even think I might understand why it works.) Try:

S0[col(S0)==ix]

R loops through the S0 matrix and finds only those entries where the column numbers match the ix[] values. I would have thought that the first row and ix[1]th column entry should be first but the machinery seems to be working differently. The user should realize that all the column==1 hits will occur first

```#-----toy code-------
rf1 <- function(n){rnorm(n,0,1)} ## normal distribution with mean=0
rf2 <- function(n){rnorm(n,4,1)} ## normal distribution with mean=4
rf3 <- function(n){rnorm(n,9,1)} ## normal distribution with mean=9
```
S0 <- cbind(rf1(10),rf2(10),rf3(10))
p1 <- 0.2; p2 <- 0.3; p3 <-0.5

set.seed<-37
ix <- sample(c(1,2,3),10,prob=c(p1,p2,p3),replace=TRUE) #> ix
# [1] 2 3 1 3 1 2 1 2 2 1
S0
#----------
S0

[,1] [,2] [,3]

``` [1,]  0.5143426 3.823013  9.666210
[2,]  0.7044110 6.095287 10.276792
[3,] -0.3597926 5.848588  8.801378
[4,]  0.1346979 5.518667  8.357181
[5,] -0.6856507 2.882090  9.893790
[6,]  0.5882977 4.864201  8.708929
[7,] -0.4241657 2.891672  8.056254
[8,]  1.6243569 2.981563 11.684422
[9,] -0.3072410 4.379916 10.679470
```

[10,] -0.6289604 3.119195 7.216593
```#--------------------------

S0[col(S0)==ix]
```

# [1] 0.1346979 -0.3072410 3.8230128 5.8485880 4.8642010 2.9815631 # [7] 10.2767916 9.8937904 8.0562542 7.2165927

R seems to be looping through the columns "looking" for the TRUE values in the col(S0)==ix matrix.

col(S0)==ix

[,1] [,2] [,3]

``` [1,] FALSE  TRUE FALSE
[2,] FALSE FALSE  TRUE
[3,] FALSE  TRUE FALSE
[4,]  TRUE FALSE FALSE
[5,] FALSE FALSE  TRUE
[6,] FALSE  TRUE FALSE
[7,] FALSE FALSE  TRUE
[8,] FALSE  TRUE FALSE
[9,]  TRUE FALSE FALSE
```

[10,] FALSE FALSE TRUE
```--
Regards;
David Winsemius

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help