I have about 100 variables that I would like to create lags for. Is there a quick way of creating lags for 5 time periods?
Thanks for your help.
VKG
Here is the new Data Step code , also faster than John King(data _null_)'s code and no need to consider about if the variable like var1-var5 . It is applied to any variable name .
data have;
infile cards expandtabs truncover;
input X1 X2 X3 X4 X5;
cards;
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
5 15 25 35 45
6 16 26 36 46
7 17 27 37 47
8 18 28 38 48
9 19 29 39 49
;
run;
%let lag=5;
data x;
do i=1 to &lag;
output;
end;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
proc sql;
select cats(_name_,'_',i,'=lag',i,'(',_name_,')') into : lag separated by ';'
from temp,x ;
quit;
data want;
set have;
&lag ;
run;
Can you post some sample data and your expected output?
Sadly there isn't an easy way 😞
Are you laging the observations for 100 variable or the values variables x1-x100? Example data is always useful and example ouput.
What are you going to do with the results? I find that building an extra 500 variables (100 * 5 lag periods each) might well become difficult to code to use in short order.
This is what you are asking for but I find it hard to believe this is the best solution. Something with array, hash or transpose will probably work. Please provide an example of what you have and what you are looking for:
data have;
input var1 var2 var3 var4 var5;
cards;
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
;run;
data want;
set have;
%macro lag;
%do i = 1 %to 5;
var&i.lag&i = lag&i(var&i);
%end;
%mend;
%lag;
run;
You want lag1-lag5 for each and every variables ?
Sorry. I make a mistake . John King(data _null_)'s code could be better.
Or if you want IML code ?
data have;
input var1 var2 var3 var4 var5;
cards;
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
;run;
data lag1;
if _n_ eq 1 then output;
else output;
set have(rename=(var1-var5=var_lag1_1-var_lag1_5));
run;
data lag2;
if _n_ eq 1 then do;
output;
output;
end;
else output;
set have(rename=(var1-var5=var_lag2_1-var_lag2_5));
run;
data lag3;
if _n_ eq 1 then do;
output;
output;
output;
end;
else output;
set have(rename=(var1-var5=var_lag3_1-var_lag3_5));
run;
data lag4;
if _n_ eq 1 then do;
output;
output;
output;
output;
end;
else output;
set have(rename=(var1-var5=var_lag4_1-var_lag4_5));
run;
data lag5;
if _n_ eq 1 then do;
output;
output;
output;
output;
output;
end;
else output;
set have(rename=(var1-var5=var_lag5_1-var_lag5_5));
run;
data want;
merge have(in=ina) lag1-lag5;
if ina;
run;
May I request you to tell me why you've used output statement before the set statement while you create the datasets lag1-lag5? How it is helpful?
Make a lag value (missing value) for all the variables ? Aren't they what you want ?
@Ksharp wrote:
You want lag1-lag5 for each and every variables ?
That doesn't look right when I run it. I think it is this.
data have;
input var1-var5;
cards;
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
;;;;
run;
proc print;
run;
data Fivelags;
set have end=eof;
if _n_ gt 1 then set have(rename=(var1-var5=lag1_1-lag1_5));
if _n_ gt 2 then set have(rename=(var1-var5=lag2_1-lag2_5));
if _n_ gt 3 then set have(rename=(var1-var5=lag3_1-lag3_5));
if _n_ gt 4 then set have(rename=(var1-var5=lag4_1-lag4_5));
if _n_ gt 5 then set have(rename=(var1-var5=lag5_1-lag5_5));
run;
Proc print width=min;
run;
Here is the example of what I am trying to do, assuming i have 5 variables X1 - X5:
X1 | X2 | X3 | X4 | X5 |
1 | 11 | 21 | 31 | 41 |
2 | 12 | 22 | 32 | 42 |
3 | 13 | 23 | 33 | 43 |
4 | 14 | 24 | 34 | 44 |
5 | 15 | 25 | 35 | 45 |
6 | 16 | 26 | 36 | 46 |
7 | 17 | 27 | 37 | 47 |
8 | 18 | 28 | 38 | 48 |
9 | 19 | 29 | 39 | 49 |
I would like to create a lag for each one of them for 5 time periods resulting i the following data set:
X1 | X2 | X3 | X4 | X5 | X1_1 | X2_1 | X3_1 | X4_1 | X5_1 | X1_2 | X2_2 | X3_2 | X4_2 | X5_2 | X1_3 | X2_3 | X3_3 | X4_3 | X5_3 | X1_4 | X2_4 | X3_4 | X4_4 | X5_4 | X1_5 | X2_5 | X3_5 | X4_5 | X5_5 |
1 | 11 | 21 | 31 | 41 | |||||||||||||||||||||||||
2 | 12 | 22 | 32 | 42 | 1 | 11 | 21 | 31 | 41 | ||||||||||||||||||||
3 | 13 | 23 | 33 | 43 | 2 | 12 | 22 | 32 | 42 | 1 | 11 | 21 | 31 | 41 | |||||||||||||||
4 | 14 | 24 | 34 | 44 | 3 | 13 | 23 | 33 | 43 | 2 | 12 | 22 | 32 | 42 | 1 | 11 | 21 | 31 | 41 | ||||||||||
5 | 15 | 25 | 35 | 45 | 4 | 14 | 24 | 34 | 44 | 3 | 13 | 23 | 33 | 43 | 2 | 12 | 22 | 32 | 42 | 1 | 11 | 21 | 31 | 41 | |||||
6 | 16 | 26 | 36 | 46 | 5 | 15 | 25 | 35 | 45 | 4 | 14 | 24 | 34 | 44 | 3 | 13 | 23 | 33 | 43 | 2 | 12 | 22 | 32 | 42 | 1 | 11 | 21 | 31 | 41 |
7 | 17 | 27 | 37 | 47 | 6 | 16 | 26 | 36 | 46 | 5 | 15 | 25 | 35 | 45 | 4 | 14 | 24 | 34 | 44 | 3 | 13 | 23 | 33 | 43 | 2 | 12 | 22 | 32 | 42 |
8 | 18 | 28 | 38 | 48 | 7 | 17 | 27 | 37 | 47 | 6 | 16 | 26 | 36 | 46 | 5 | 15 | 25 | 35 | 45 | 4 | 14 | 24 | 34 | 44 | 3 | 13 | 23 | 33 | 43 |
9 | 19 | 29 | 39 | 49 | 8 | 18 | 28 | 38 | 48 | 7 | 17 | 27 | 37 | 47 | 6 | 16 | 26 | 36 | 46 | 5 | 15 | 25 | 35 | 45 | 4 | 14 | 24 | 34 | 44 |
Any help programming this would be appreciated vs using lag for each of the 100 variables 5 times.
Regards,
Vaneeta.
The macro that I've provided does just that, just change the 5 to 100.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.