# Re: [R] Generating Data using Formulas

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Thu, 31 May 2007 07:35:59 +0100 (BST)

On Wed, 30 May 2007, Charles C. Berry wrote:

> Christian,
>
> The formula language is not suited to such recursive useage
> AFAICS.
But filter() is. In this case the result is an AR(1) process, so arima.sim() could be used (and internally that uses filter).

I know this is an exercise, but using 'y0 = 0' is unrealistic: arima.sim allows you to do better.

> You can _vectorize_ your code like this:
>
> cmat <- outer( 1:25, 1:25, function(y,x) ifelse( x>y, 0, 0.8^(y-x) ) )
> res <- replicate(1000,{
> y <- 1 + cmat %*% rnorm(25)
> coef(lm(y[-1]~y[-25]))
> })
> rowMeans(res) # mean of 1000 replicates
>
> HTH,
>
> Chuck
>
> On Tue, 29 May 2007, Chrisitan Falde wrote:
>
>> Hello,
>>
>> My name is Christian Falde. I am new to R.
>>
>> My problem is this. I am attempting to learn R on my own. In so doing I
>> am using some problems from Davidson and MacKinnon Econometric Theory
>> and Methods to do so. This is because I can already do the some of the
>> problems in SAS so I am attempting to rework them using R. Seemed
>> logical to me, now I am stuck and its really bugging me.
>>
>>
>> The problem is this
>>
>> Generate a data set sample size of 25 with the formula y=1+.8*y(t-1)+ u.
>> Where y is the dependent, y(t-1) is the dependent variable lagged one
>> peroid, and u is the classical error term. Assume y0=0 and the u is
>> NID(0,1). Use this sample to compute the OLS estimates B1 (1) and
>> B2(.8). Repeat at least 100 times and find the average of the B's.
>> Use these average to estimate the bias of the ols estimators.
>>
>> To start I did the following non lagged program.
>>
>> final<-function(i,j){x<-function(i) {10*i}
>> y<-function(i,j) {1+.8*10*i+100*rnorm(j)}
>> datathreeone<- data.frame(replicate(100,coef(lm(y(i,j)~x(i)))))
>> rowMeans(datathreeone)}
>> final(1:25,25)
>> final(1:50,50)
>> final(1:100,100)
>> final(1:200,200)
>> final(1:10000,10000)
>>
>>
>> Now the "only" thing I need to to is change ".8*10*i" which is
>> exogenous to ".8* y(t-1) ".
>>
>> There are two reasons why I did it this way. I needed the rnorm(i) to
>> generate a new set of u's each replication, and I wanted to be able to
>> use the function as i did to make the results more concise.
>>
>> For the lag in SAS we used an if then else logic relating to the number
>> of observation. This in R would have to be linked to the invisable row
>> number. I think I need an index variable for the row. Perhaps, sorry
>> thinking while typing.
>>
>> Another reason why I am stuck, the lag function was seemingly straight forward.
>>
>> lag (x, k=1)
>>
>> yet x has to be a matrix so when I tried to do it like above with y as a
>> function R complained.
>>
>> I have been working on this for a couple of days now so everything is
>> begining to not make sense. It just seems to me to get the matrix to
>> work out I would need to have two matrices.
>>
>> dependent and explanatory
>> y1 = sum ( 1 +.8*0 + 100*rnorm(i))
>> y2 = sum ( 1 +.8* (dependent row 1) + 100*rnorm(i))
>> etc
>>
>> I just am not sure how to do that.
>>
>>
>> christian falde
>>
> [snip]
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive Medicine
> E mailto:cberry_at_tajo.ucsd.edu UC San Diego
> http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help