Re: [R] Generate a serie of new vars that correlate with existingvar

From: Greg Snow <Greg.Snow_at_intermountainmail.org>
Date: Thu 05 Apr 2007 - 14:53:46 GMT


Oliver,

I have thought of adding something like this to a package, but here is my current thinking on the issue.

This question (or similar) has been asked a few times, so there is some demand for a general answer, I see three approaches:

  1. Have an example of the necessary steps archived in a publicly available place.
  2. Write a function and include it in a non-core package.
  3. Add it to the core of R or a core package.

Number 1 is already in process as the e-mails will be part of the archive. Though someone is welcome to add it to the Wiki if they think that would be useful as well.

Your suggestion is number 3, but I would argue that 2 is better than 3 for the simple reason that anything added to the core is implied to be top quality and have pretty much any options that most people would think of. Putting it in a non-core package makes it available, with less implications of quality.

The question then becomes, what options do we make available? Do we have them specify the entire correlation structure? Or just assume the new variables will be independent of each other? What should the function do if the set of correlations result in a matrix that is not positive definite? What if the user wants to have 2 fixed variables? And other questions.

My current thinking is that the process is simple enough that it is easier to do this by hand than to remember all the options to the function. There are currently people who use bootstrap and permutation tests without loading in the packages that do these because it is quicker to write the code by hand than to remember the syntax of the functions. I think this type of data generation falls under the same situation. But if you, or someone else thinks that there is enough justification for a function to do this, and can specify what options it should have, I will be happy to add it to my TeachingDemos package (this seems an appropriate place, since one of the places that I want to generate data with a specific correlation structure is when creating an example for students).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow@intermountainmail.org
(801) 408-8111
 
 


> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
> Olivier ETERRADOSSI
> Sent: Thursday, April 05, 2007 3:09 AM
> To: r-help@stat.math.ethz.ch
> Subject: Re: [R] Generate a serie of new vars that correlate
> with existingvar
>
> Hello, list
> why not add the smart proposal by Greg Snow as a built-in
> function in {stats}, just changing the "x234" and "newc"
> lines to allow for more distributions to be generated ?
> Or do I miss an already existing function to do that ?
> Regards. Olivier
>
>
> # slight modification of the original code by Greg Snow
> [mailto:Greg.Snow@intermountainmail.org]
> # on April 04, 2007 1:46 AM
>
> # generates ndistr vectors of same mean and sd, with various
> cor.coeffs # input :
> # x1 : a vector
> # ndistr : number of distributions
> # coefs : vector o ndistr correl. coeffs
>
> CorelSets<-function(x1= rnorm(100, 15, 5),ndistr=3,
> coefs=c(0.4,0.5,0.6)){
>
> # x2, x3, and x4 in a matrix, these will be modified to meet
> the criteria
> x234 <- scale(matrix( rnorm(ndistr*length(x1)), ncol=ndistr ))
>
> # put all into 1 matrix for simplicity
> x1234 <- cbind(scale(x1),x234)
>
> # find the current correlation matrix
> c1 <- var(x1234)
>
> # cholesky decomposition to get independence
> chol1 <- solve(chol(c1))
>
> newx <- x1234 %*% chol1
>
> # check that we have independence and x1 unchanged
> zapsmall(cor(newx))
> all.equal( x1234[,1], newx[,1] )
>
> # create new correlation structure
> newc<-diag(ndistr+1)
> newc[1,-1]<- coefs
> newc[-1,1]<- coefs
>
> chol2 <- chol(newc)
>
> finalx <- newx %*% chol2 * sd(x1) + mean(x1)
> pairs(finalx)
> CorelSets<-finalx
> }
> > Message-ID: <000c01c77642$a9555750$0fe05e81@D145LD1S>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Dear Greg,
> > Thanks million!
> > "As good as it gets" :)
> > All the best
> > Nguyen
> >
> > -----Original Message-----
> > From: Greg Snow [mailto:Greg.Snow@intermountainmail.org]
> > Sent: Wednesday, April 04, 2007 1:46 AM
> > To: Nguyen Dinh Nguyen; r-help@stat.math.ethz.ch
> > Subject: RE: [R] Generate a serie of new vars that correlate with
> > existing var
> >
> > Here is one way to do
> it:......8<.................snip.........8<....
> >
> --
> Olivier ETERRADOSSI
> Maître-Assistant
> CMGD / Equipe "Propriétés Psycho-Sensorielles des Matériaux"
> Ecole des Mines d'Alès
> Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 tel std: +33
> (0)5.59.30.54.25 tel direct: +33 (0)5.59.30.90.35
> fax: +33 (0)5.59.30.63.68
>
http://www.ema.fr
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Fri Apr 06 01:13:59 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 06 Apr 2007 - 08:31:01 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.