Hello,
I am attempting to lag a variable by region, month, and year. Essentially, I need to create a new variable that is the lagged mean value (by year and month) for a neighboring region.
I have this:
region month year mean
1 1 00 5
1 2 00 6
1 3 00 4
1 1 01 4
1 2 01 5
1 3 01 5
2 1 00 4
2 2 00 5
2 3 00 3
2 1 01 3
2 2 01 5
2 3 01 2
and I need:
region month year mean neighbor_mean_lag
1 1 00 5 .
1 2 00 6 .
1 3 00 4 .
1 1 01 4 10
1 2 01 5 13
1 3 01 5 16
2 1 00 10 .
2 2 00 13 .
2 3 00 16 .
2 1 01 3 .
2 2 01 5 .
2 3 01 2 .
(Note neighbor_mean_lag equals the neighboring region's monthly values for the previous year.)
I am able to create a lagged variable for the same region using the below code. However, I haven't been successful at creating the lagged variable for a neighboring region.
DATA lag3;
SET lag2;
BY region month year;
lag1_mean=lag(mean);
if first.month and first.year then do;
lag1_mean=.;
end;
RUN;
Thank you for your help!
A self join will be simpler than using LAG
data have;
input region month year mean;
datalines;
1 1 00 5
1 2 00 6
1 3 00 4
1 1 01 4
1 2 01 5
1 3 01 5
2 1 00 4
2 2 00 5
2 3 00 3
2 1 01 3
2 2 01 5
2 3 01 2
;
data have1;
set have; by region notsorted;
reg + first.region;
run;
proc sql;
create table want as
select
a.region,
a.month,
a.year,
a.mean,
b.mean as neighbor_mean_lag
from
have1 as a left join
have1 as b
on b.reg = a.reg+1 and
b.year = a.year-1 and
b.month = a.month
order by a.reg, year, month;
quit;
looks like it is not a mean, it is a MEAN variable value for the next region. I would try Hash Table.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.