Hi all, When I only use a single Key the below calculates correctly. But once I add in additional everything starts going out of whack and "miscalculating" (from my point of view). Why is this not calculating the difference in days from the PREVIOUS record by KEY?
Working version:
DATA HAVE;
INPUT Key1 Key2 DE ACTION $;
FORMAT DE DATETIME23.3;
INFORMAT DE DATETIME23.3;
DATALINES;
80040415 80006184 29OCT2019:15:35:00 1
80040415 80006184 29OCT2019:15:34:00 2
80040415 80006184 29OCT2019:15:09:00 1
80040415 80006184 29OCT2019:14:02:00 1
;
RUN;
DATA WANT;
SET HAVE;
BY Key2 Key1 DESCENDING DE;
IF ACTION = 1 THEN aFLAG = 1;
IF aFlag AND LAG1(Action) ne 1 THEN
DD = INTCK('dtday', DE, LAG1(DE));
RUN;
Not Working:
DATA HAVE;
INPUT Key1 Key2 DE ACTION $;
FORMAT DE DATETIME23.3;
INFORMAT DE DATETIME23.3;
DATALINES;
80004070 80006147 12NOV2019:09:45:00 1
80004070 80006147 05NOV2019:10:34:00 1
80040415 80006184 29OCT2019:15:35:00 1
80040415 80006184 29OCT2019:15:34:00 2
80040415 80006184 29OCT2019:15:09:00 1
80040415 80006184 29OCT2019:14:02:00 1
;
RUN;
DATA WANT;
SET HAVE;
BY Key2 Key1 DESCENDING DE;
IF ACTION = 1 THEN aFLAG = 1;
IF aFlag AND LAG1(Action) ne 1 THEN
DD = INTCK('dtday', DE, LAG1(DE));
RUN;
You must not use the lag() function conditionally, as it feeds its FIFO chain only when called.
DATA WANT;
SET HAVE;
BY Key2 Key1 DESCENDING DE;
l_de = lag(de);
IF ACTION = 1 THEN aFLAG = 1;
IF aFlag AND LAG1(Action) ne 1 THEN
DD = INTCK('dtday', DE, l_de);
drop l_de;
RUN;
You must not use the lag() function conditionally, as it feeds its FIFO chain only when called.
DATA WANT;
SET HAVE;
BY Key2 Key1 DESCENDING DE;
l_de = lag(de);
IF ACTION = 1 THEN aFLAG = 1;
IF aFlag AND LAG1(Action) ne 1 THEN
DD = INTCK('dtday', DE, l_de);
drop l_de;
RUN;
EDIT: ID10T error. If you read this already please ignore.
Thank you again for the help!
It is not advisable to use the LAG function conditionally - it wont work correctly otherwise. Put your LAG function in a statement that is executed for every row then use the lagged variable in your conditions:
Action_Lag1 = LAG1(Action);
Hi @Krueger When you get time, please read this classic explanation of LAG functionality by @hashman
07-30-2019 05:26 PM
When you use the LAGn function, you need to understand its nature. To recap:
-- LAGn is a queue of N items in memory occupying <item length>*N bytes. If the queue is numeric, <item length> = 8.
-- Every time LAGn is called for the same allocated queue, the item in the front of the queue is ejected (called dequeueing), and the value of the argument enters the rear of the queue (called enqueueing), displacing the rest of the items 1 position towards the front.
-- "For the same allocated queue" means that each time the compiler sees another reference to LAGn, it allocates a separate queue with N items. Thus,
x = lag (x) ;
x = lag (x) ;
is not at all the same as:
do i = 1 to 2 ;
x = lag (x) ;
end ;
This is because in the former case, the compiler has seen 2 LAG references and organized 2 separate, completely independent, queues. Thus, the first x=lag(x) causes the dequeing and enqueueing only in the first LAG queue, and the second x=lag(x) does the same only for the second queue. In the latter case, the compiler sees only one LAG reference and therefore organizes a single LAG queue, so that each time the DO loop iterates, the dequeing and enqueueing occur in the same, single, LAG queue.
-- Thus, a LAGn queue cannot be "cleared up" by doing anything with the variables, to which a call to the LAGn function assigns the dequeued value. It can be cleared up only by calling LAGn for the same queue (i.e. in a loop) N times, so that the item currently in the back of the queue is moved forward to the front of it and gets dequeued. The LAGn argument used in this action should be given the value to which you want the queue reinitialized - for example, a missing value or zero. This way, when after that you call LAGn again to create your assigned lag values, the first item dequeued will be that value.
After these prelim notes, it should be clear how to clear your queues before each BY group:
data have ;
input sn count ;
cards ;
11075652 12
11075652 4
11075652 3
11075652 1
11075652 1
11075682 1
11075682 2
11075682 2
11075682 2
11075682 0
11075682 2
run ;
data want ;
if 0 then set have ; /*keep the original variable order*/
count = 0 ; /*you want to initialize LAGs to 0*/
do _n_ = 1 to 6 ; /*loop 6 times, as LAG6 is longest*/
link lag ; /*use LINK to have the compiler see each LAGn just once*/
end ;
do until (last.sn) ;
set have ;
by sn ;
link lag ; /*use LINK to have the compiler see each LAGn just once*/
cum_sum = sum (count, of lag1-lag6) ;
output ;
end ;
return ;
lag: lag1 = lag1 (count) ;
lag2 = lag2 (count) ;
lag3 = lag3 (count) ;
lag4 = lag4 (count) ;
lag5 = lag5 (count) ;
lag6 = lag6 (count) ;
return ;
run ;
This way, the LAGn variables will remain 0 for the first N records in each BY group, as they should. Alternatively, you can reinitialize the queues with missing values - it won't affect CUM_SUM, but in this case, the LAGn variables will remain missing for the first N records in each BY group, and coding will be a bit terser:
data want ;
do until (last.sn) ;
set have ;
by sn ;
link lag ;
cum_sum = sum (count, of lag1-lag6) ;
output ;
end ;
count = . ;
do _n_ = 1 to 6 ;
link lag ;
end ;
return ;
lag: lag1 = lag1 (count) ;
lag2 = lag2 (count) ;
lag3 = lag3 (count) ;
lag4 = lag4 (count) ;
lag5 = lag5 (count) ;
lag6 = lag6 (count) ;
return ;
run ;
Kind regards
Paul D.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.