All,
Thank you in advance for taking a look at this quandary. I know there is a way to do this, and I might be really close to solving it, but at this point, I can't see the forest for the trees.
data have;
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt :mmddyy10. hospital $ ;
format clm_from_dt clm_thru_dt mmddyy10.;
datalines;
88575 31 16 389 12/19/2018 12/20/2018 Chandler
88575 31 34 387 01/30/2019 01/31/2019 Chandler
88575 33 34 387 09/14/2019 09/18/2019 Chandler
88575 33 19 330 10/02/2019 10/07/2019 Chandler
88575 33 00 394 10/13/2019 10/15/2019 Chandler
88575 33 34 386 11/02/2019 11/02/2019 Chandler
88575 33 00 699 11/16/2019 11/21/2019 GoodSam
;
run;
I am trying to create flags that help me to determine whether or not a line is an anchor.
The rules are:
An anchor can be the first record in a patient grouping OR when the epiid/epigrp combination changes OR if there are more than 90 days between the clm_from_dt and the prior clm_thru_dt OR if the hospital changes.
The output should resemble this:
data want;
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt :mmddyy10. hospital $ anch_flag $ readm_flg $;
format clm_from_dt clm_thru_dt mmddyy10.;
datalines;
88575 31 16 389 12/19/2018 12/20/2018 Chandler y
88575 31 34 381 01/30/2019 01/31/2019 Chandler y
88575 33 34 387 09/14/2019 09/18/2019 Chandler y
88575 33 19 330 10/02/2019 10/07/2019 Chandler y
88575 33 00 394 10/13/2019 10/15/2019 Chandler x
88575 33 34 386 11/02/2019 11/02/2019 Chandler x
88575 33 00 699 11/16/2019 11/21/2019 GoodSam x
;
run;
I don't really need the 'x' in the output. Just did it for the import.
As you can see, the first line is an anchor because it is the first record in the series.
The second line is an anchor because the epiid/epigrp changed
The third line is an anchor because it is greater than 90 days
The fourth line is an anchor because the epiid/epigrp changed
The fifth line is not an anchor because the epiid remained the same (when there isn't an epigrp, it leans on the epiid only)
The sixth line is not an anchor because of the anchor on 9/14 is still within 90 days.
The seventh line is not an anchor because it is still within 90 days of the anchor on 10/2 (line 4)
This is clearly an example and there are situations were a pat_id only has one record and others where there are 10 records. My struggle is with the massive OR rule I created above and wondering if it makes more sense to run several data steps (which I think are easier to read) or something that does this all at once.
The continued 90 day lookback is causing the angst. I have created a series of lags with the dates and epigrp to attempt to 'flag' the correct lines, but I am really struggling. I would paste what I've done, but it's kind of a mess and to be honest, I'm a little embarrassed at how this hasn't worked.
Any direction would be greatly appreciated.
-Brian
Here is a solution:
data want;
set have;
by pat_id epiid epigrp notsorted;
grp0=epigrp='00' or lag(epigrp)='00';
datedif=clm_from_dt-lag(clm_thru_dt);
if first.epiid then /* new patient or epiid changed */
flag='y';
else if datedif>90 then /* the datedif criterium only works within the same pat_id */
flag='y';
else if first.epigrp and not grp0 then
flag='y';
drop grp0 datedif;
run;
The important thing is to calculate the variables depending on the LAG function before diving into IF ... THEN ... ELSE. If they are not calculated every time, results will be unpredictable, and probably wrong.
I would have thought
88575 33 34 386 11/02/2019 11/02/2019 Chandler x
to be Y for the reason there is a change from 33 00 to 33 34. Though 33 00 would mean, the combo would lean to epiid 33, shouldn't 34 still be considered a change?
Sorry. I do understand the 90 days logic albeit the epiid/epigrp combination is still seemingly unclear. Perhaps i'm overlooking something that may make me my question sound naive.
@shounster wrote:
The rules are:
An anchor can be:
- the first record in a patient grouping OR
- when the epiid/epigrp combination changes OR
- if there are more than 90 days between the clm_from_dt and the prior clm_thru_dt OR
- if the hospital changes.
1, 2, & 4 should be achievable using BY group processing. First get that working so you have your groups more easily identified and numbered. Then go through again and figure out the 90 day issue and that will use lag only on the date. Then you can split sessions up again as needed. IME this is typically the approach used, does it align with what you're thinking.
For BY group numbering see this post:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
With this complex set of rules I would suggest building it by steps.
First for each PIECE of your rule create a 1/0 valued variable where 1 is true and 0 false.
This can be done fairly easily with statements like:
This would create a value of 1 when var1=3 and 0 otherwise.
Rule1 = (var1 = 3);
Once you have these individual little pieces getting the correct logical assignment with a bunch of variables you can start building your "or" combinations. This bit of "pieces" is because you have values that are not equal that you are attempting to treat as equal, your epigrp variable when 00 is an example.
And you may need to build the result up in multiple stages.
If you use the lag function do not use it as part of an IF statement as the queue behavior means things likely aren't what you expect.
You wrote:
An anchor can be the first record in a patient grouping OR when the epiid/epigrp combination changes OR if there are more than 90 days between the clm_from_dt and the prior clm_thru_dt OR if the hospital changes.
But then
The seventh line is not an anchor because it is still within 90 days of the anchor on 10/2 (line 4)
Why is there a comparison with line 4? The last rule is fulfilled, hospital changes from "Chandler" to "GoodSam", from your description, i expected the 7th line to be an anchor.
I think you need to explain the interactions between those rules.
Here is a solution:
data want;
set have;
by pat_id epiid epigrp notsorted;
grp0=epigrp='00' or lag(epigrp)='00';
datedif=clm_from_dt-lag(clm_thru_dt);
if first.epiid then /* new patient or epiid changed */
flag='y';
else if datedif>90 then /* the datedif criterium only works within the same pat_id */
flag='y';
else if first.epigrp and not grp0 then
flag='y';
drop grp0 datedif;
run;
The important thing is to calculate the variables depending on the LAG function before diving into IF ... THEN ... ELSE. If they are not calculated every time, results will be unpredictable, and probably wrong.
Thank you all for taking a few minutes to review this issue. I was able to utilize a little bit of everyone's response, so thanks for that. I did receive more clarity from those that asked this of me and there were some changes (of course). I at least have some tools to work with the changes.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.