Hey all,
I currently attempting to create a new database that is just an improved version of one already in existence. One of the big problems (and massive eyesore) of the old dataset is that a particular variable (genotype) is a free text categorical variable so if you misspelled a diagnosis or accidentally added an extra space it counts that as a new level of that variable.
To paint the picture of how annoying this is and why we need to change it for the new database: This variable has approximately 35 levels that we are interested in but the current database has about 117 unique values for that variable.
I want to consolidate the old diagnoses into the new format (which is a categorical variable that takes a value between 1-13, where a value of 12 allows for a drop-down box to free text if in Redcap). Is there a way to quicken the process of changing the levels so that they don't repeat with weird (and useless) additions or am I about to just write a massive "if-then" statement?
Ok, I was worried that was probably the better, albeit not quickest, solution. It will be a one-time process (in theory) since once the new database goes live your only choice will be to select a value between 1-13 corresponding to that diagnosis.
When you say fuzzy matching, are you talking about the COMPGED function?
Thanks for your help
If this is a one time process and only about mapping 117 values to 35 then doing this "manually" in code appears as the quickes solution to me.
Instead of IF/THEN you could create and use a SAS format.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.