If I want to find possible and probable cases of a disease using full-text search from a cohort of people, what would some suggestions be?
For example, I have two variables: PatientNumber and Text. In some cases, the text tells me immediately - "Possible diagnosis" - but other times, the text never uses "possible" or "probable." I don't know if I can search for "might have diagnosis" because I don't think doctors write that, but I don't want to eliminate possible cases just because they don't have the word "possible" or "probable" in it.
Any ideas would be useful. Please help!
Take a sample of the unclassified text (i.e. that don't have "probable" or "possible"). BTW, you did screen for negatives right, like "not probable"?. See if there are expressions you conclude mean probable or possible. Add those expressions to your classification logic. Apply to your data, thereby reducing the number of unclassified.
Repeat the above with a sample of the remaining unclassified text.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.