I have a problem with lag function in do loop. I want to get lag1, lag2, lag3,...,lagn but without using macro so I tried to do it in this way but it doesn't work.
retain S 0;
do i=1 to 10;
S=S+lag(x);
end;
Has anybody any solutions?
can you provide a small example of data that you have and data that you are trying to get?
data example;
input id price;
datalines;
1 437.83
2 458.5
3 457
4 456.98
5 459.11
6 453.91
7 444.05
8 456.47
9 463.25
10 474
;
run;
and I want to count WMA(Weigthed Moving Average) of prices. It looks like this for period n:
WMA=(n*Price+(n-1)*lag(Price)+...+1*lag(n-1)(Price))/(1+...+n).
Do not use + in these calculations if you want the partial moving average for the first records as lag(n), in your notation, will not have a value until record n+1. Any calculation using + or *with missing will yield missing. So use SUM.
Just to clarify, then, please confirm ...
The WMA calculation should use a different number of terms on each observation with no limit on the maximum number of terms?
The calculations should never start over mid-stream when some other variable (not shown in the example) changes?
Thanks!
Well, it's not clear what you are trying to accomplish here. But it is likely that macro language would not be involved. If you want separate values for lag1, ..., lagn, just code it:
data want;
set have;
lag1 = lag1(x);
lag2 = lag2(x);
...
lagn = lagn(x);
run;
If you want the sum of the prior 10 values of x, you only need to use two lag functions:
data want;
set have;
back1 = lag(x);
back11 = lag11(x);
S + back1;
if back11 > . then S = S - back11;
run;
You might have to tweak the logic, depending on the result you want for the first 10 observations.
if it's something else entirely that you are seeking, you'll have to elaborate.
Good luck.
If you need to do something with a parameter to pick which lag you might need to look into SAS/ IML where the data is a matrix.
The other solution is to create all of the lagged variables you need:
lagx1= lag(x);
lagx2= lag2(x);
etc.
array lags lagx1 - lagx10;
do I= 1 to 10;
S= sum(S,lags);
end;
drop lagx: ;
I use sum as for the first record Lagx1 to lagx10 will be missing, for the second record lag2 to lag10 will be missing. So using + you wouldn't have gotten any values accumulated for S until the 11th record because + missing yields missing.
It seems that you would like to calculate some kind if simple moving average (as Astounding described). However, if you simply would like to get the "i" attached to the lag function, you could try call execute:
Data A (Drop=i);
Do i=1 To 100;
x=1;
Output;
End;
Run;
Data _NULL_;
Call Execute ('Data B;');
Call Execute (' Set A;');
Do i=1 To 10;
Call Execute (' s=Sum(s,lag'!!Compress(Put(i,2.))!!'(x));');
End;
Call Execute (' Output;');
Call Execute ('Run;');
Run;
If you have SAS/ETS, PROC EXPAND would be the first place you want to take a look, it may not get what you want at one shot though.
The core of your issue, I suppose, is that you want to attain a decent amount of flexibility/ scalability while not to use Macro Code-Gen to jam your SAS code ? LAG() is nothing but a queue function, there are other ways to realize queues without using LAG(), one of which is Hash. Although one can construct a dynamic Hash or Array at the size of your lagging range, it is heck lot easier to dump everything into the Hash, with a key that maps data step sequenced iterations.
data example;
input id price;
n=_n_;
datalines;
1 437.83
2 458.5
3 457
4 456.98
5 459.11
6 453.91
7 444.05
8 456.47
9 463.25
10 474
;
data want;
retain _lg 4;/*the lagging range*/
if _n_=1 then
do;
dcl hash h(dataset:'example (keep=n price )', ordered:'a');
h.definekey('n');
h.definedata('price');
h.definedone();
call missing (n,price);
end;
set example(keep=id);
if _n_>=_lg then
do;
do _i=0 to _lg-1;
_rc=h.find(key:_n_-_i);
_nom+(_lg-_i)*price;
_denom+(_i+1);
end;
wma=_nom/_denom;
output;
call missing (_nom, _denom);
end;
drop n price _:;
run;
And here is a dynamic Hash approach, in case your computer memory is not large enough:
data example;
input id price;
datalines;
1 437.83
2 458.5
3 457
4 456.98
5 459.11
6 453.91
7 444.05
8 456.47
9 463.25
10 474
;
data want_dym;
retain _lg 4;/*the lagging range, this maps to the 'n-1' in LAGn()*/
if _n_=1 then
do;
dcl hash h(ordered:'a');
h.definekey('_n_');
h.definedata('price');
h.definedone();
dcl hiter hi('h');
end;
set example;
_rc=h.add();
if _n_>=_lg then
do;
do _i=0 to _lg-1;
_rc=h.find(key:_n_-_i);
_nom+(_lg-_i)*price;
_denom+(_i+1);
end;
wma=_nom/_denom;
output;
call missing (_nom, _denom);
_rc=h.remove(key:_n_-_lg+1);
end;
drop price _:;
run;
SQL can do this:
data example; input id price; datalines; 1 437.83 2 458.5 3 457 4 456.98 5 459.11 6 453.91 7 444.05 8 456.47 9 463.25 10 474 ; run; %let lag=10 ; proc sql; create table want as select *,(select sum(id*price)/sum(id) from example where id between a.id-&lag and a.id) as wma from example as a; quit;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.