I am using the lag function to identify when the same MemberID has a new AdmitID (002) and creating "Ignore" flag on that line. This part works for the first instance of the new AdmitID but I need all lines that are AdmitID = 002 to also say ignore as I am conditionally summing the data later. I tried adding a second if statement but have had no luck copying down the "ignore" where AdmitID = 002. Any help or suggestions for a better approach are appreciated/
data want; set have;
format maternity $char8.;
if MemberID = lag(MemberID)
and AdmitID ^= lag(AdmitID)
and dt_incurred >= lag(dt_adm)
and dt_incurred <= lag(dt_dischg) then Maternity = "Ignore";
if lag(Maternity) = "Ignore" and AdmitID=lag(AdmitID) then Maternity=lag(Maternity);
run;
This is what I am getting:
MemberID | AdmitID | Maternity |
AAA | 001 | |
AAA | 001 | |
AAA | 001 | |
AAA | 002 | Ignore |
AAA | 002 | |
AAA | 002 |
but this is what I am after:
MemberID | AdmitID | Maternity |
AAA | 001 | |
AAA | 001 | |
AAA | 001 | |
AAA | 002 | Ignore |
AAA | 002 | Ignore |
AAA | 002 | Ignore |
Like this?
data WANT;
set HAVE;
format MATERNITY $char8.;
if MEMBERID = lag(MEMBERID)
& ADMITID ^= lag(ADMITID)
/* & DT_INCURRED >= lag(DT_ADM)
& DT_INCURRED <= lag(DT_DISCHG) */then do;
IGNORE+1;
MATERNITY = "Ignore";
end;
if MEMBERID ne lag(MEMBERID) then IGNORE=0;
if IGNORE and ADMITID = lag(ADMITID) then Maternity='Ignore';
run;
"Conditional lag" immediately rings a very loud alarm bell.
The LAG() functions build a FIFO queue where a new element is added ONLY when the function is actually called, not when a data step iteration is executed or an observation is read or written.
The proper method for conditionally getting lagged values is to always (unconditionally) store the result of the function in a temporary variable (that you drop later), and use this variable in your conditional branch:
l_Maternity = lag(Maternity);
if l_Maternity = "Ignore" and AdmitID = lag(AdmitID) then Maternity = l_Maternity;
drop l_Maternity;
Using the function in a condition is not critical, as SAS does not optimize the conditions. In the above example, the lag(AdmitID) will be executed even if l_Maternity is not equal to "Ignore".
For the sake of efficiency (read: execution speed), it is always better to execute the function only once for each needed variable in a data step and use the temporary variables. Function calls are costly in terms of CPU cycles.
You should not use LAG for this. Retain the Maternity variable instead, much easier:
data want;
set have;
by MemberID AdmitID notsorted;
format maternity $char8.;
retain maternity;
if not first.MemberID and first.AdmitID
and dt_incurred >= lag(dt_adm)
and dt_incurred <= lag(dt_dischg) then Maternity = "Ignore";
else if first.AdmitID then Maternity=' ';
run;
Instead of checking the ID variables with LAG, just use BY and FIRST. I put in a NOTSORTED option on the by statement in case the data is actually sorted by something else.
The code looks a bit ineffective, as we check twice for the same condition (first.AdmitID), but as others have noted, it is important to make sure that the LAG function is called for every single observation.
You don't need to use LAG. Use BY group processing and RETAIN instead.
data have;
input MemberID $ AdmitID $ ;
cards;
AAA 001
AAA 001
AAA 001
AAA 002
AAA 002
AAA 002
;
data want;
set have ;
by memberid admitid;
length Maternity $8 ;
if first.memberid then Maternity=' ';
else if first.admitid then Maternity='Ignore';
retain Maternity;
run;
Member Admit Obs ID ID Maternity 1 AAA 001 2 AAA 001 3 AAA 001 4 AAA 002 Ignore 5 AAA 002 Ignore 6 AAA 002 Ignore
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.