I'm trying to identify patients who are hospitalized based on a primary diagnosis that is included in certain groups of diagnoses. These groups are called PQIs. There are also exclusions that would exclude patients from these groups. Exclusions can be diagnoses or medical procedures, and they can come from any DX or proc variable. They don't have to be a primary diagnosis.
I have a file with medical claims for each patient, with 1 record per hospitalization. Each record has 25 DX vars and 25 proc vars (I'm only showing 3 of each here). The hospitalizations should be flagged independently of each other.
I have another file with a flag associated with each DX or proc code related to each PQI. So the first step is to compare the DX inclusion codes against the primary diagnosis variable in the claims file, and then flag any hospitalization that matches. The next step is to match the DX or proc exclusion codes against any of the 25 DX or proc vars in the claims file, and flag the matches.
I set up a hash to do this, based on similar work I did a few years ago. But it's not flagging correctly and I don't know why. This is what it should look like:
Obs |
FLAG_INC_PQI5 |
FLAG_EXC_PQI5 |
FLAG_INC_PQI7 |
FLAG_EXC_PQI7 |
BENE_ID |
ADMSN_DT |
1 |
1 |
1 |
0 |
0 |
12345 |
21223 |
2 |
0 |
0 |
0 |
0 |
12345 |
21339 |
3 |
0 |
0 |
0 |
1 |
23456 |
21253 |
data have;
infile datalines dsd dlm=',' truncover;
input BENE_ID ADMSN_DT date9. DGNS_1_CD $ DGNS_2_CD $ DGNS_3_CD $ SRGCL_PRCDR_1_CD $ SRGCL_PRCDR_2_CD $ SRGCL_PRCDR_3_CD $;
datalines;
12345,08FEB2018,J410,E118,E125,04CL3ZZ,"",""
12345,04JUN2018,L110,E185,E120,08DG3ZZ,"",""
23456,10MAR2018,E185,L110,E120,04CL3ZZ,"",""
;RUN;
data flags;
infile datalines dsd dlm=',' truncover;
input FLAG_INC_PQI5 FLAG_EXC_PQI5 FLAG_INC_PQI7 FLAG_EXC_PQI7 INC_DX_code $ EXC_DX_code $ EXC_proc_code $;
datalines;
1,0,0,0,J410,.,.
0,1,0,0,.,E118,.
0,0,0,1,.,.,04CL3ZZ
;RUN;
data want(keep=BENE_ID ADMSN_DT FLAG_:);
LENGTH INC_DX_Code EXC_DX_Code EXC_proc_Code $ 8;
if _n_=1 then
do;
if 0 then set flags;
dcl hash h_inc_dx(dataset:"flags");
h_inc_dx.defineKey('INC_DX_Code');
h_inc_dx.defineData(all:'y');
h_inc_dx.defineDone();
dcl hash h_exc_dx(dataset:"flags");
h_exc_dx.defineKey('EXC_DX_Code');
h_exc_dx.defineData(all:'y');
h_exc_dx.defineDone();
dcl hash h_exc_proc (dataset:"flags");
h_exc_proc.defineKey('EXC_proc_Code');
h_exc_proc.defineData(all:'y');
h_exc_proc.defineDone();
end;
call missing(of _all_);
set have;
/*Inclusion DX codes apply to Principal DX (DGNS_1_CD) only."*/
Array dx_inc_codes $ DGNS_1_CD;
Array dx_exc_codes $ DGNS_1_CD DGNS_2_CD DGNS_3_CD;
Array proc_exc_codes $ SRGCL_PRCDR_1_CD SRGCL_PRCDR_2_CD SRGCL_PRCDR_3_CD;
array _a_dx_inc $ DGNS_1_CD;
do _i=1 to dim(_a_dx_inc);
if h_inc_dx.find(key:_a_dx_inc[_i])=0 then
do;
output;
/*return;--The return statement writes a row to output as soon as it finds a matching
flag and then stops further checks. But since there can be multiple flag matches per record, it should be
commented out*/
end;
end;
array _a_dx_exc $ ADMTG_DGNS_CD DGNS_1_CD DGNS_2_CD DGNS_3_CD;
do _i=1 to dim(_a_dx_exc);
if h_exc_dx.find(key:_a_dx_exc[_i])=0 then
do;
output;
/*return;*/
end;
end;
array _a_proc_exc $ SRGCL_PRCDR_1_CD SRGCL_PRCDR_2_CD SRGCL_PRCDR_3_CD;
do _i=1 to dim(_a_proc_exc);
if h_exc_proc.find(key:_a_proc_exc[_i])=0 then
do;
output;
/*return;*/
end;
end;
/*Set up an array to recode missing values to 0*/
Array _arr(*) FLAG_:;
Do i=1 to dim(_arr);
If _arr(i)=. then _arr(i)=0;
End;
Drop i;
RUN;
Given that example data which are correct, which are incorrect and how to do we know the difference?
If you run the code and look at the 'want' file, you will see that it does not match the table I included in my original post. For example, ADMSN_DT 21339 shouldn't have any flags (because none of the DX or proc codes match any of the DX or proc codes for the flags), and yet it is flagged for both FLAG_INC_PQI5 and FLAG_EXC_PQI5
The output data set created by your code has an Admsn_dt. The shown expected values doesn't. Perhaps the Admsn_dt would be a clue if we know what was expected.
Good idea to format date variables.
Since the example data set I get running your code has 15 observations not 3, I am not quite sure what I am supposed to compare. If I am supposed to get 3 then I would suspect that the OUTPUT statements and the timing of their execution are involved.
The formatting for the ADSMN_DT variable is being dropped somewhere along the way. The values are the same (the number of days since Jan 1 1960) but the way they're displayed is different.
08FEB2018 = 21223
04JUN2018 = 21339
10MAR2018 = 21253
There should only be 3 records in the output file, and as you correctly point out, there are 15. Furthermore, the flags are not correct. For example, the 04JUN2018 / 21339 record(s) should be 0 for all flags.
I do not understand the structure of your FLAGS dataset.
From your verbal description I was expecting to see something along the lines of of a dataset with four variables.
PQI ICD INCLUDE EXCLUDE
Where PQI is the grouping, ICD is the diagnosis/procedure code and INCLUDE and EXCLUDE are flags to indicate if that code indicates the the PQI applies or does not apply.
The idea is that a patient must have an inclusion code in the primary DX variable in order to be included in the PQI. However, this is a necessary but not sufficient criterion to be included. If the patient has an inclusion code but also has an exclusion code (remember that exclusion codes apply to any DX or proc variable), then they will be excluded.
The idea is that we want to identify "simple" cases with a given health condition. PQI7 represents hypertension, and we want to review the treatment these patients receive. But if a patient has hypertension but also receives a coronary bypass, that is a complex case and it will require different treatment. We want to exclude that case from our review group.
If you wanted to check for ONE group, say HBP, you would first check if they meet the inclusion criteria. Then IF they do check if the meet any of the exclusion criteria.
So let's say you want to create a 0/1 flag to indicate if this observations indications HBP.
So your looping might look like (in sort of pseudo code).
hbp=0;
do index=1 to dim(array1) while (not hbp);
if array1[index] in (list_of_included_codes_for_HBP) then HBP=1;
end;
do index=1 to dim(array2) while (hbp);
if array2[index] in (list_of_excluded_code_for_HBP) then HBP=0;
end;
You might be able to use hash objects to effect the tests in the IF statements.
You might want to make a third (or fourth) DO loop to handle procedure codes in addition to diagnosis codes.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.