DATA Step, Macro, Functions and more

dynamic lag values

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

dynamic lag values

I have trouble understanding lag function in SAS.  In this sample code I want to update ‘Comm_Inc’  and ‘Goal’ based on the previous month ‘Goal’.

201610 ‘Goal’ is updated correctly with 15000+Comm_Income

However 201611 and 201612 isnt updated with lag4 values of Goal. I can do this by looping through months multiple times. Can this can be done using lag option ?

 

 

DATA temp;

 INFILE DATALINES DSD;

 INPUT month_end $ group $ Goal Comm Comm_Inc RowNum;

DATALINES;

201610,East,15000,0.2,3000,1

201610,North,20000,0.3,6000,2

201610,South,10000,0.1,1000,3

201610,West,22000,0.1,2200,4

201611,East,,0.2,,5

201611,North,,0.3,,6

201611,South,,0.1,,7

201611,West,,0.1,,8

201612,East,,0.2,,9

201612,North,,0.3,,10

201612,South,,0.1,,11

201612,West,,0.1,,12

;

 

proc sort data=temp;

by month_end group;

run;

 

 

data  temp;

set temp;

by month_end group;

retain Goal;

if month_end = 201610  then do;

Goal=Goal;

Comm_Inc = Goal*Comm;

Goal = Goal+Comm_Inc;

end;

 

else do;

Goal=lag4(Goal);

Comm_Inc = Goal*Comm;

Goal = Goal+Comm_Inc;

end;

 

run;

 

 

Final Expected Output:

month_end

group

Goal

Comm

Comm_Inc

201610

East

18000

0.2

$3,000

201610

North

26000

0.3

$6,000

201610

South

11000

0.1

$1,000

201610

West

24200

0.1

$2,200

201611

East

21600

0.2

$3,600

201611

North

33800

0.3

$7,800

201611

South

12100

0.1

$1,100

201611

West

26620

0.1

$2,420

201612

East

25920

0.2

$4,320

201612

North

43940

0.3

$10,140

201612

South

13310

0.1

$1,210

201612

West

29282

0.1

$2,662

 


Accepted Solutions
Solution
‎03-01-2017 04:07 PM
Super User
Posts: 5,092

Re: dynamic lag values

Yes, LAG is more complicated than it looks.  In particular, you need to execute it on every observation for it to get the results you expect.  It's hardly ever correct to use LAG inside a DO group.

 

I would recommend two sets of changes.  First, do not replace GOAL, the existing variable.  Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables.  Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value.  Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.

 

Second, change the order.  Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END.  In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.

 

Good luck.

 

View solution in original post


All Replies
Solution
‎03-01-2017 04:07 PM
Super User
Posts: 5,092

Re: dynamic lag values

Yes, LAG is more complicated than it looks.  In particular, you need to execute it on every observation for it to get the results you expect.  It's hardly ever correct to use LAG inside a DO group.

 

I would recommend two sets of changes.  First, do not replace GOAL, the existing variable.  Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables.  Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value.  Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.

 

Second, change the order.  Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END.  In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.

 

Good luck.

 

Occasional Contributor
Posts: 17

Re: dynamic lag values

This is the solution I got from your recommendation. Thanks

proc sort data=temp;
by group month_end ;
run;


data temp1;
set temp;
by group month_end;
retain Calculated_Goal Comm_Inc;
if first.group then do;
Comm_Inc = Goal*Comm;
Calculated_Goal = Goal+Comm_Inc;
Goal = Calculated_Goal;
end;
else do;
Comm_Inc = Calculated_Goal*Comm;
Calculated_Goal = Calculated_Goal+Comm_Inc;
end;

run;
Valued Guide
Posts: 797

Re: dynamic lag values

There are a couple problems, both related to the fact that LAGs are queue-management functions, not "lookback" like in excel.

 

 

  1. Using lag function inside an IF condition.  This means that the 4-deep lag queue (from lag4(goal)) is not updated with every observation.  Typically you want the queue updated even if you are not using the result.  This could be easily fixed.
  2. More problematic, I think, is the fact the you have
       goal=lag4(goal) ;   
    which immediately updates the 4-item queue of goal values.  ... followed later by
       goal=goal+comm_inc
    which updates the value of goal in the program data vector, but does not update the corresponding element in the lag4 queue.  As a result that updated value of goal is not available to subsequent lag4 executions.

 

 

Occasional Contributor
Posts: 17

Re: dynamic lag values

Thanks for helping me understand issues with this code. I can fix each of the code issues that you pointed out. Instead i changed the sort order, created new variable to get away without lag

Valued Guide
Posts: 797

Re: dynamic lag values

This can be done with lag, even when sorted by monthend/group.

 

It's not initialy intutitive, but it does allow for a very simple program: 

 

data want (drop=oldgoal);
  oldgoal=lag3(goal);
  set temp;
  if _n_<=4 then goal=goal+comm_inc;
  else do;
    comm_inc=comm*oldgoal;
    goal=oldgoal+comm_inc;
  end;
run;

 

It has three main features:

  1. It takes the lag of goal and puts it in another var (oldgoal), as per @Astounding's suggestion.
  2. Because you were modifying the GOAL variable after retrieving its value, the lag queue was not getting the modified value.  To remedy this I moved the LAG function to precede the SET statement.  This allowed it to work from the modified values (but note it's the modified value of the PRIOR obs, which is about to be replaced by the SET statement).
  3. Instead of lag4 (for 4 regions: east/north/south/west), I have to use LAG3.  This is because I placed the lag function prior to the SET statement, so I need to go back one less entry.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 200 views
  • 1 like
  • 3 in conversation