If I want to find possible and probable cases of a disease using full-text search from a cohort of people, what would some suggestions be?
For example, I have two variables: PatientNumber and Text. In some cases, the text tells me immediately - "Possible diagnosis" - but other times, the text never uses "possible" or "probable." I don't know if I can search for "might have diagnosis" because I don't think doctors write that, but I don't want to eliminate possible cases just because they don't have the word "possible" or "probable" in it.
Any ideas would be useful. Please help!
Take a sample of the unclassified text (i.e. that don't have "probable" or "possible"). BTW, you did screen for negatives right, like "not probable"?. See if there are expressions you conclude mean probable or possible. Add those expressions to your classification logic. Apply to your data, thereby reducing the number of unclassified.
Repeat the above with a sample of the remaining unclassified text.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.