@FreelanceReinh Thank you for your detailed explanation of the behavior various Lag functions in this Data Step. This has been a good discussion. I initially thought this was some kind of a Bug with the Lag function behaving weirdly as in case of the 5th observation. But your detailed analysis of each lag function occurrence having its own queue makes a lot of sense to explain this and demonstrate that is in fact not a Bug but it is working as designed. As mentioned in Chris's Blog, the best way to avoid this kind of behavior is to declare 2 variables for lag(a) and lag(b) respectively before the IF statements and then use these variables in the conditions of the If statements. This ensures lag(a) and lag(b) are each processed only once during each iteration of the Data Step.
The Data Step example can further be simplified as below to show the same kind of behavior, this time in only the second observation.
data test;
infile datalines dlm=',' dsd;
input a b;
datalines;
4272451,17878
4272451,17879
;
run;
data testLags;
retain e f ( 1 1);
set test;
if a=lag(a) and b>lag(b) then e=e+1; /**(1)**/
else if a^=lag(a) or lag(a)=. then e=1; /**(2)**/
if a^= lag(a) or lag(a)=. then f=1; /**(3)**/
else if a=lag(a) and b>lag(b) then f=f+1; /**(4)**/
run;
proc print;
run;
Output :
Thanks also to others who participated in this discussion.
... View more