Hi,
I have a data set that contains two variables : the subject ID and the conclusion of a medical exam.
ID | CONCLUSION |
1 | Signs of arrhytmia. |
2 | Hypoplastic syndrome of left heart |
3 | Normal exam. |
4 | No sign of ventricular septal defect. Arrythmia. |
I need to assign a code (from another data set) according to what is written in the conclusion variable.
The data set containing the codes looks like this :
CODE | DESCRIPTION |
0.0 | Normal exam |
1.1 | Ventricular septal defect |
2.2 | Arrhytmia |
3.3 ... | Hypoplastic left heart syndrome ... |
I am looking for a way to match those two data sets and a function that would assign a code even if the expression is not the exact same. In addition to that, if it is written "no sign of", is there a way to not assign the code? Plus, if the word order is not the exact same, is it possible to assign the code anyway (see the hypoplastic left heart syndrome in my exemple).
I specify that it has to be as automatic as possible because my data set has about 20,000 observations.
Here is an exemple of what I am looking for :
ID | CONCLUSION | CODE |
1 | Signs of arrhytmia | 2.2 |
2 | Hypoplastic syndrome of left heart | 3.3 |
3 | Normal exam. | 0.0 |
4 | No sign of ventricular septal defect. Arrythmia. | 2.2 |
If it is not clear enough, please tell me. I will try to make it more clear.
Thank you in advance.
Your question is very clear, but sadly this is a VERY challenging type of requirement to implement.
SAS has some very sophisticated text analytics tools, which cost a very sophisticated price. On the off chance that you have access to one of them, use it!
Otherwise, I think you're going to need to do some pre-processing of your "Conclusion" field, to try to get it into a reasonable form for matching (for example, the "no sign of..." type construct is always going to be a huge problem). SAS has two text matching functions, COMPGED and COMPLEV. I'm attaching an example of how it could be used below; the only problem is that the results aren't correct.
Hopefully this will move you another step closer.
Tom
data Exams;
length Conclusion $30;
input ID Conclusion &;
cards;
1 Signs of arrhytmia.
2 Hypoplastic syndrome of left heart
3 Normal exam.
4 No sign of ventricular septal defect. Arrythmia.
run;
data Descriptions;
length Description $30;
input Code Description &;
cards;
0.0 Normal exam
1.1 Ventricular septal defect
2.2 Arrhytmia
3.3 Hypoplastic left heart syndrome
run;
proc sql noprint;
create table Combined as
select e.ID, e.Conclusion, d.Code, d.Description, complev(e.Conclusion, d.Description, 1000, "il") as MatchScore
from Exams e cross join Descriptions d
order by MatchScore;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.