I have trouble understanding lag function in SAS. In this sample code I want to update ‘Comm_Inc’ and ‘Goal’ based on the previous month ‘Goal’.
201610 ‘Goal’ is updated correctly with 15000+Comm_Income
However 201611 and 201612 isnt updated with lag4 values of Goal. I can do this by looping through months multiple times. Can this can be done using lag option ?
DATA temp;
INFILE DATALINES DSD;
INPUT month_end $ group $ Goal Comm Comm_Inc RowNum;
DATALINES;
201610,East,15000,0.2,3000,1
201610,North,20000,0.3,6000,2
201610,South,10000,0.1,1000,3
201610,West,22000,0.1,2200,4
201611,East,,0.2,,5
201611,North,,0.3,,6
201611,South,,0.1,,7
201611,West,,0.1,,8
201612,East,,0.2,,9
201612,North,,0.3,,10
201612,South,,0.1,,11
201612,West,,0.1,,12
;
proc sort data=temp;
by month_end group;
run;
data temp;
set temp;
by month_end group;
retain Goal;
if month_end = 201610 then do;
Goal=Goal;
Comm_Inc = Goal*Comm;
Goal = Goal+Comm_Inc;
end;
else do;
Goal=lag4(Goal);
Comm_Inc = Goal*Comm;
Goal = Goal+Comm_Inc;
end;
run;
Final Expected Output:
month_end |
group |
Goal |
Comm |
Comm_Inc |
201610 |
East |
18000 |
0.2 |
$3,000 |
201610 |
North |
26000 |
0.3 |
$6,000 |
201610 |
South |
11000 |
0.1 |
$1,000 |
201610 |
West |
24200 |
0.1 |
$2,200 |
201611 |
East |
21600 |
0.2 |
$3,600 |
201611 |
North |
33800 |
0.3 |
$7,800 |
201611 |
South |
12100 |
0.1 |
$1,100 |
201611 |
West |
26620 |
0.1 |
$2,420 |
201612 |
East |
25920 |
0.2 |
$4,320 |
201612 |
North |
43940 |
0.3 |
$10,140 |
201612 |
South |
13310 |
0.1 |
$1,210 |
201612 |
West |
29282 |
0.1 |
$2,662 |
Yes, LAG is more complicated than it looks. In particular, you need to execute it on every observation for it to get the results you expect. It's hardly ever correct to use LAG inside a DO group.
I would recommend two sets of changes. First, do not replace GOAL, the existing variable. Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables. Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value. Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.
Second, change the order. Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END. In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.
Good luck.
Yes, LAG is more complicated than it looks. In particular, you need to execute it on every observation for it to get the results you expect. It's hardly ever correct to use LAG inside a DO group.
I would recommend two sets of changes. First, do not replace GOAL, the existing variable. Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables. Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value. Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.
Second, change the order. Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END. In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.
Good luck.
There are a couple problems, both related to the fact that LAGs are queue-management functions, not "lookback" like in excel.
Thanks for helping me understand issues with this code. I can fix each of the code issues that you pointed out. Instead i changed the sort order, created new variable to get away without lag
This can be done with lag, even when sorted by monthend/group.
It's not initialy intutitive, but it does allow for a very simple program:
data want (drop=oldgoal);
oldgoal=lag3(goal);
set temp;
if _n_<=4 then goal=goal+comm_inc;
else do;
comm_inc=comm*oldgoal;
goal=oldgoal+comm_inc;
end;
run;
It has three main features:
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.