I'm using PC SAS 9.4M3.
I've done some time series modeling, generated scoring code, and I'm trying to predict into the future. But the issue I'm having is that I've been aksed to use AdaptiveReg and not TimeSeries so I had to code the lag variables myself. No problem there.
Where I'm running into issues is with the circular logic, for each row in my data, after the actuals, I need to grab the previous value, score the model, then store that prediction so I can grab it for the next row. Sounds simple enough.
So for a simplificaion let's say I have a dataset like this where A is my dep var and Z is my prediction. It is simply 2*LAG_A. I have 4 months of data and I'm trying to predict out to month 9.
month | a | lag_a | z |
1 | 2 | . | . |
2 | 3 | 2 | 4 |
3 | 5 | 3 | 6 |
4 | 5 | 10 | |
5 | |||
6 | |||
7 | |||
8 | |||
9 |
I tried something like the following:
data scored;
set actuals;/*the above table without column z*/
z=2*Lag_A;
if a=. then a=z;/*take the predicted value as my dep var where I have no actuals*/
temp_lag=lag(a);/*I want the lag to execute each time so all values go into the queue*/
if lag_a=. then lag_a=temp_lag;/*only fill in if I don't have the actual value already there*/
run;
this works great for month 4, but not for month 5 since I need the lag statement before I calculate z (otherwise I don't have a value in lag_a), BUT I can't really do that since it loads the value of a to the queue and before I calculate Z, A is missing.
I tried to do the lag before and after, thinking that doing it before would grab the value from the queue while loading a missing to the queue, and then doing it again after scoring would put the correct value in the queue, but that didn't work as I expected either.
data scored;
set actuals;/*the above table without column z*/
temp_lag=lag(a);/*grab last value from queue*/
z=2*Lag_A;
if a=. then a=z;/*take the predicted value as my dep var where I have no actuals*/
temp_lag=lag(a);/*write the new value of a to the queue*/
if lag_a=. then lag_a=temp_lag;/*only fill in if I don't have the actual value already there*/
run;
The only change is that lag_a does get filled in, but z still doesn't calculate correctly, it's still getting missing values from the queue.
I thought maybe I coudl do SYMGET before calculating Z and then do a CALL SYMPUT after calculations, but my understanding is that the macro var CALL SYMPUT creates isn't available till after the RUN;
Is there any hope?
Numbers are worth thousand words.
Here is a very simple array approach. No need to use lag() function. I assumed your months are numbers like 1 , 2 , ...
But it doesn't affect the array approach. The steps are:
[1] Store the values of A into an array (K[]) which is indexed by MONTH.
[2] Getting the lag1 for Month i is simply to get the value of A corresponding to (i-1)th month.
[3] When A is missing, we use the lag value, compute Z and replace the missing value by Z in the array.
data want;
array k[9] _temporary_;
do until(eof);
set have end = eof;
k[month] = a;
end;
month = 1;
a = k[1];
output;
do i = 2 to 9;
month = i;
if k[i] = . then do;
lag_a = k[i-1];
z = 2 * lag_a;
a = z;
k[i] = a;
end;
else do;
a = k[i];
lag_a = k[i - 1];
z = 2 * lag_a;
end;
output;
end;
drop i;
run;
If you have any problem to handle the array because of then values of month, come back and tell how your real months are. It can be handled by the array.
Hope this solution is acceptable to you.
Store for the NEXT row would use RETAIN.
I think what you want is
Retain Lag_a;
BUT you need to assign a value to it for everytime you expect to need it
I can't tell if the data you show is what you are getting or want.
Macro varibles are really not needed and very likelyt to not work the way you want.
You didn't give much of an example but this may be what your are looking for:
data scored;
set actuals;/*the above table without column z*/
retain lag_A .;
temp_lag=lag(a);/*write the new value of a to the queue*/
if lag_a=. then lag_a=temp_lag;/*only fill in if I don't have the actual value already there*/
z=2*Lag_a;
if a=. then a=z;
lag_A =a; /* saves the current value of a */
run;
Before leaving a record the code I provided sets the Lag_a value to the value of A, not to Lag(a).
Did you try running the code?
The logic for Hash is going to be very similar for Retain.
I see now that I lost the desired results. (I had typed it all up, had to take care of something else, came back, made sure it was ready to post, hit post, but I had timed out so I lost the whole post)
Here is what I want to end up with.
month | a | lag_a | z |
1 | 2 | . | . |
2 | 3 | 2 | 4 |
3 | 5 | 3 | 6 |
4 | 10 | 5 | 10 |
5 | 20 | 10 | 20 |
6 | 40 | 20 | 40 |
7 | 80 | 40 | 80 |
8 | 160 | 80 | 160 |
9 | 320 | 160 | 320 |
Numbers are worth thousand words.
Here is a very simple array approach. No need to use lag() function. I assumed your months are numbers like 1 , 2 , ...
But it doesn't affect the array approach. The steps are:
[1] Store the values of A into an array (K[]) which is indexed by MONTH.
[2] Getting the lag1 for Month i is simply to get the value of A corresponding to (i-1)th month.
[3] When A is missing, we use the lag value, compute Z and replace the missing value by Z in the array.
data want;
array k[9] _temporary_;
do until(eof);
set have end = eof;
k[month] = a;
end;
month = 1;
a = k[1];
output;
do i = 2 to 9;
month = i;
if k[i] = . then do;
lag_a = k[i-1];
z = 2 * lag_a;
a = z;
k[i] = a;
end;
else do;
a = k[i];
lag_a = k[i - 1];
z = 2 * lag_a;
end;
output;
end;
drop i;
run;
If you have any problem to handle the array because of then values of month, come back and tell how your real months are. It can be handled by the array.
Hope this solution is acceptable to you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.