I have two data sets. One is the main data set "MAIN" which has the variables ptid (patient id), hosp (name of hospital) and dx (principal diagnosis). The other data set "DIAGNOSES" has just one variable DX, which is a list of common diagnoses. I want to see if the diagnosis in MAIN is already in the data set DIAGNOSES.
data main; input @1 ptid $4 @6 hosp $6 @10 dx $65;
cards;
abcd JHop acute myocardial infarction w/ comorbidity
swer UMD acute myocardial infarction with comorbidity
pppp StJo diabetes
......
;
run;
data diagnoses; input @1 dx $65;
cards;
acute myocardial infarction w/ comorbidity
diabetes
gastric obstruction
hematoma
......
;
Normally getting the two rable to match merge would be simple. The problem is that the variable dx in MAIN can have multiple ways of being captured. For example, these should all be the same dx:
acute myocardial infarction w/ comorbidity
acute myocardial infarction with comorbidity
acute myocardial infarction with / comorbidity
A similar but different diagnosis is:
acute myocardial infarction w/ major comorbidity
acute myocardial infarction, major comorbidity
acute myocardial infarction & major comorbidity
These small spelling quirks make direct matching impossible (there are close to 1000 values for dx in DIAGNOSES).
Is there any way to match on similar but not exactly the same text string?
Thanks!
Andrew
O.K., I think I've found a way to do this. There are a few functions in SAS that compare two strings for the number of characters in one but not the other. So I can use two do lookups. One that looks up all the entries in the DIAGNOSES table into an array, and another that matches each entry in the array to diagnoses in the MAIN table. The key function is COMPLEV.
data match (keep = dx ptid hosp)
close (keep = dx ptid distance possible_dx);
array dxs[808] $70 _temporary_;
do until (done);
set diagnoses end=done;
count+1;
dxs[count] = dx;
end;
do until (checkdone);
set main end=checkdone;
do i = 1 to dim(dxs);
distance = complev(dx, dxs[i],'iln');
if distance=0 then do;
output match;
leave;
end;
else if distance <= 10 then do;
possible_dx = dxs[i];
output close;
end;
end;
end;
stop;
run;
I'll need to combine MATCH and CLOSE, sort by dx and descending distance, and take the "first.dx".
A good article on the complev function can be found at: https://support.sas.com/kb/48/582.html
Andrew
I might suggest starting by creating a second variable, so you maintain the origin value for verification by people at different points, and "clean" that second variable such as removing "&" or "w/" or similar, make sure the number of spaces between words is consistent, case, expand or consistently replace words with abbreviations and such.
Then start attempting to match.
Or if the data has any ICD diagnosis codes start there. Of course if that is the goal that wouldn't be available would it.
O.K., I think I've found a way to do this. There are a few functions in SAS that compare two strings for the number of characters in one but not the other. So I can use two do lookups. One that looks up all the entries in the DIAGNOSES table into an array, and another that matches each entry in the array to diagnoses in the MAIN table. The key function is COMPLEV.
data match (keep = dx ptid hosp)
close (keep = dx ptid distance possible_dx);
array dxs[808] $70 _temporary_;
do until (done);
set diagnoses end=done;
count+1;
dxs[count] = dx;
end;
do until (checkdone);
set main end=checkdone;
do i = 1 to dim(dxs);
distance = complev(dx, dxs[i],'iln');
if distance=0 then do;
output match;
leave;
end;
else if distance <= 10 then do;
possible_dx = dxs[i];
output close;
end;
end;
end;
stop;
run;
I'll need to combine MATCH and CLOSE, sort by dx and descending distance, and take the "first.dx".
A good article on the complev function can be found at: https://support.sas.com/kb/48/582.html
Andrew
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.