BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
arunmmw
Fluorite | Level 6

I have trouble understanding lag function in SAS.  In this sample code I want to update ‘Comm_Inc’  and ‘Goal’ based on the previous month ‘Goal’.

201610 ‘Goal’ is updated correctly with 15000+Comm_Income

However 201611 and 201612 isnt updated with lag4 values of Goal. I can do this by looping through months multiple times. Can this can be done using lag option ?

 

 

DATA temp;

 INFILE DATALINES DSD;

 INPUT month_end $ group $ Goal Comm Comm_Inc RowNum;

DATALINES;

201610,East,15000,0.2,3000,1

201610,North,20000,0.3,6000,2

201610,South,10000,0.1,1000,3

201610,West,22000,0.1,2200,4

201611,East,,0.2,,5

201611,North,,0.3,,6

201611,South,,0.1,,7

201611,West,,0.1,,8

201612,East,,0.2,,9

201612,North,,0.3,,10

201612,South,,0.1,,11

201612,West,,0.1,,12

;

 

proc sort data=temp;

by month_end group;

run;

 

 

data  temp;

set temp;

by month_end group;

retain Goal;

if month_end = 201610  then do;

Goal=Goal;

Comm_Inc = Goal*Comm;

Goal = Goal+Comm_Inc;

end;

 

else do;

Goal=lag4(Goal);

Comm_Inc = Goal*Comm;

Goal = Goal+Comm_Inc;

end;

 

run;

 

 

Final Expected Output:

month_end

group

Goal

Comm

Comm_Inc

201610

East

18000

0.2

$3,000

201610

North

26000

0.3

$6,000

201610

South

11000

0.1

$1,000

201610

West

24200

0.1

$2,200

201611

East

21600

0.2

$3,600

201611

North

33800

0.3

$7,800

201611

South

12100

0.1

$1,100

201611

West

26620

0.1

$2,420

201612

East

25920

0.2

$4,320

201612

North

43940

0.3

$10,140

201612

South

13310

0.1

$1,210

201612

West

29282

0.1

$2,662

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Yes, LAG is more complicated than it looks.  In particular, you need to execute it on every observation for it to get the results you expect.  It's hardly ever correct to use LAG inside a DO group.

 

I would recommend two sets of changes.  First, do not replace GOAL, the existing variable.  Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables.  Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value.  Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.

 

Second, change the order.  Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END.  In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.

 

Good luck.

 

View solution in original post

5 REPLIES 5
Astounding
PROC Star

Yes, LAG is more complicated than it looks.  In particular, you need to execute it on every observation for it to get the results you expect.  It's hardly ever correct to use LAG inside a DO group.

 

I would recommend two sets of changes.  First, do not replace GOAL, the existing variable.  Instead, create a new variable CALCULATED_GOAL (or whatever you want to call it) based on GOAL and your other variables.  Because each time the SET statement brings in a new observation, it wipes out your previous GOAL value.  Use a RETAIN statement to preserve the value of that new variable as you move from observation to observation.

 

Second, change the order.  Instead of sorting by MONTH_END GROUP, sort by GROUP MONTH_END.  In that way, you can process te observations sequentially and just need to examine first.group and last.group to detect when GROUP changes.

 

Good luck.

 

arunmmw
Fluorite | Level 6
This is the solution I got from your recommendation. Thanks

proc sort data=temp;
by group month_end ;
run;


data temp1;
set temp;
by group month_end;
retain Calculated_Goal Comm_Inc;
if first.group then do;
Comm_Inc = Goal*Comm;
Calculated_Goal = Goal+Comm_Inc;
Goal = Calculated_Goal;
end;
else do;
Comm_Inc = Calculated_Goal*Comm;
Calculated_Goal = Calculated_Goal+Comm_Inc;
end;

run;
mkeintz
PROC Star

There are a couple problems, both related to the fact that LAGs are queue-management functions, not "lookback" like in excel.

 

 

  1. Using lag function inside an IF condition.  This means that the 4-deep lag queue (from lag4(goal)) is not updated with every observation.  Typically you want the queue updated even if you are not using the result.  This could be easily fixed.
  2. More problematic, I think, is the fact the you have
       goal=lag4(goal) ;   
    which immediately updates the 4-item queue of goal values.  ... followed later by
       goal=goal+comm_inc
    which updates the value of goal in the program data vector, but does not update the corresponding element in the lag4 queue.  As a result that updated value of goal is not available to subsequent lag4 executions.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
arunmmw
Fluorite | Level 6

Thanks for helping me understand issues with this code. I can fix each of the code issues that you pointed out. Instead i changed the sort order, created new variable to get away without lag

mkeintz
PROC Star

This can be done with lag, even when sorted by monthend/group.

 

It's not initialy intutitive, but it does allow for a very simple program: 

 

data want (drop=oldgoal);
  oldgoal=lag3(goal);
  set temp;
  if _n_<=4 then goal=goal+comm_inc;
  else do;
    comm_inc=comm*oldgoal;
    goal=oldgoal+comm_inc;
  end;
run;

 

It has three main features:

  1. It takes the lag of goal and puts it in another var (oldgoal), as per @Astounding's suggestion.
  2. Because you were modifying the GOAL variable after retrieving its value, the lag queue was not getting the modified value.  To remedy this I moved the LAG function to precede the SET statement.  This allowed it to work from the modified values (but note it's the modified value of the PRIOR obs, which is about to be replaced by the SET statement).
  3. Instead of lag4 (for 4 regions: east/north/south/west), I have to use LAG3.  This is because I placed the lag function prior to the SET statement, so I need to go back one less entry.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2550 views
  • 1 like
  • 3 in conversation