Hi all,
I have a data table that I have created using the following code
data simulation; a1 = 0; x1 = 50; seed = -2000; do i = -50 to 100; a = rannor(seed); x = x1 + a - .6*x1; if i > 0 then output; x1 = x; a1 = a; end; run; quit;
proc arima data = simulation; identify var = x nlag = 20; estimate p = 1 noint printall ; /*identify var = y crosscorr = x nlag = 20;*/ forecast lead = 5 out = boxj; run; quit;
I want to simulate a new variable y which relies on itself and x as such:
is there a way to simulate this variable in SAS ? This is recursive and uses values of the already simulated variable x
Thank you
Hi,
I think you will need to use the RETAIN statement and explicitly create your recursive lag variables for Y and A in order to simulate your data for Y. Please see if the following allows you to obtain your desired result:
data boxj(keep=x xl1 xl2);
set boxj; /* output data set from PROC ARIMA step */
if x=. then x=forecast; /* fill in missing X values with forecast */
xl1=lag1(x); xl2=lag2(x); /* create lag1 and lag2 variables for X */
run;
/* simulate yt = .95yt-1 - .225yt-2 + 1.0234xt-1 - .9xt-2 + at - .3at-1 */
data simy;
set boxj(firstobs=3); /* omit first 2 obs with missing Lag X values */
retain yl1 yl2 20 al1 0; /* specify starting values for yl1, yl2, al1 */
call streaminit(907356); /* specify positive seed to reproduce results */
a=rand('normal');
y = .95*yl1 - .225*yl2 + 1.0234*xl1 - .9*xl2 - .3*al1 + a;
output;
yl2=yl1; yl1=y; al1=a; /* generate recursive lag variables for y and a */
run;
I hope this helps!
DW
Use the LAG1() and LAG2() functions.
Thank you for your reply,
I have never used the LAG functions. I am looking up examples and trying to fit them to my specific data.
Would this be along the lines of :
creating and assigning 4 new variables for yt - 1, yt-2, xt-1, xt - 2 ? then plugging them into my yt formula?
Would you know of good examples especially since I have Yt-k and Xt-k (two different variables)
Thank you
This should work:
data sim2;
retain y 0;
set boxj;
y = 123*lag1(y) - 976*lag2(y) + 0.879*lag1(x) - 7654*lag2(x) + 456*a - 234*lag1(a);
run;
(untested)
Hi,
I think you will need to use the RETAIN statement and explicitly create your recursive lag variables for Y and A in order to simulate your data for Y. Please see if the following allows you to obtain your desired result:
data boxj(keep=x xl1 xl2);
set boxj; /* output data set from PROC ARIMA step */
if x=. then x=forecast; /* fill in missing X values with forecast */
xl1=lag1(x); xl2=lag2(x); /* create lag1 and lag2 variables for X */
run;
/* simulate yt = .95yt-1 - .225yt-2 + 1.0234xt-1 - .9xt-2 + at - .3at-1 */
data simy;
set boxj(firstobs=3); /* omit first 2 obs with missing Lag X values */
retain yl1 yl2 20 al1 0; /* specify starting values for yl1, yl2, al1 */
call streaminit(907356); /* specify positive seed to reproduce results */
a=rand('normal');
y = .95*yl1 - .225*yl2 + 1.0234*xl1 - .9*xl2 - .3*al1 + a;
output;
yl2=yl1; yl1=y; al1=a; /* generate recursive lag variables for y and a */
run;
I hope this helps!
DW
Hi DW
Thank You this does seem to do the job. I am going to run it with different Arima models and see if I am getting close to my goal
This is greatly appreciated
Hi PG,
Thank you for the help,
Unfortunately it gives null values for the created y variable. I am basically simulating a transfer function model (dynamic regressive model) and I have the Yt formula as it is expanded by BoxJenkins...I wonder why it gives null values..
Again Thank you
@Dids wrote:
Hi PG,
Thank you for the help,
Unfortunately it gives null values for the created y variable. I am basically simulating a transfer function model (dynamic regressive model) and I have the Yt formula as it is expanded by BoxJenkins...I wonder why it gives null values..
Again Thank you
Probably because you do not have any value of Y for the first record or two. If the formula is to look at two prior periods and they are missing then expect the result to be missing. "Recursion" has to start with something.
Example: Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones.
Values are defined as:
with seed values
So you need a seed or starting values of Y for the first two records in your data. Since you have never mentioned if Y is one of the results shown that's about as far as I can get on this particular issue.
As soon as yt is missing, yt+1 will be missing also, and so on... The only way out is to prevent y from being missing. For example:
data simulation;
call streaminit(7556);
x = 50;
do i = -50 to 100;
a = rand("normal");
x = x + a - .6*x;
output;
end;
run;
data sim2;
retain y 0;
set simulation;
y = coalesce (.95*lag1(y) - .225*lag2(y) + 1.0234*lag1(x) - .9*lag2(x) - .3*lag1(a) + a, 0);
if i > 0 then output;
drop i;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.