I have trouble understanding lag function in SAS. In this sample code I want to update ‘Comm_Inc’ and ‘Goal’ based on the previous month ‘Goal’.
201610 ‘Goal’ is updated correctly with 15000+Comm_Income
However 201611 and 201612 isnt updated with lag4 values of Goal. I can do this by looping through months multiple times. Can this can be done using lag option ?
DATA temp;
INFILE DATALINES DSD;
INPUT month_end $ group $ Goal Comm Comm_Inc RowNum;
DATALINES;
201610,East,15000,0.2,3000,1
201610,North,20000,0.3,6000,2
201610,South,10000,0.1,1000,3
201610,West,22000,0.1,2200,4
201611,East,,0.2,,5
201611,North,,0.3,,6
201611,South,,0.1,,7
201611,West,,0.1,,8
201612,East,,0.2,,9
201612,North,,0.3,,10
201612,South,,0.1,,11
201612,West,,0.1,,12
;
proc sort data=temp;
by month_end group;
run;
data temp;
set temp;
by month_end group;
retain Goal;
if month_end = 201610 then do;
Goal=Goal;
Comm_Inc = Goal*Comm;
Goal = Goal+Comm_Inc;
end;
else do;
Goal=lag4(Goal);
Comm_Inc = Goal*Comm;
Goal = Goal+Comm_Inc;
end;
run;
Final Expected Output:
month_end |
group |
Goal |
Comm |
Comm_Inc |
201610 |
East |
18000 |
0.2 |
$3,000 |
201610 |
North |
26000 |
0.3 |
$6,000 |
201610 |
South |
11000 |
0.1 |
$1,000 |
201610 |
West |
24200 |
0.1 |
$2,200 |
201611 |
East |
21600 |
0.2 |
$3,600 |
201611 |
North |
33800 |
0.3 |
$7,800 |
201611 |
South |
12100 |
0.1 |
$1,100 |
201611 |
West |
26620 |
0.1 |
$2,420 |
201612 |
East |
25920 |
0.2 |
$4,320 |
201612 |
North |
43940 |
0.3 |
$10,140 |
201612 |
South |
13310 |
0.1 |
$1,210 |
201612 |
West |
29282 |
0.1 |
$2,662 |
Yes, LAG is more complicated than it looks. In particular, you need to execute it on every observation for it to get the results you expect. It's hardly ever correct to use LAG inside a DO group.
I would recommend two sets of changes. First, do not replace GOAL, the existing variable. Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables. Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value. Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.
Second, change the order. Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END. In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.
Good luck.
Yes, LAG is more complicated than it looks. In particular, you need to execute it on every observation for it to get the results you expect. It's hardly ever correct to use LAG inside a DO group.
I would recommend two sets of changes. First, do not replace GOAL, the existing variable. Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables. Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value. Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.
Second, change the order. Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END. In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.
Good luck.
There are a couple problems, both related to the fact that LAGs are queue-management functions, not "lookback" like in excel.
Thanks for helping me understand issues with this code. I can fix each of the code issues that you pointed out. Instead i changed the sort order, created new variable to get away without lag
This can be done with lag, even when sorted by monthend/group.
It's not initialy intutitive, but it does allow for a very simple program:
data want (drop=oldgoal);
oldgoal=lag3(goal);
set temp;
if _n_<=4 then goal=goal+comm_inc;
else do;
comm_inc=comm*oldgoal;
goal=oldgoal+comm_inc;
end;
run;
It has three main features:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.