Hey all,
I currently attempting to create a new database that is just an improved version of one already in existence. One of the big problems (and massive eyesore) of the old dataset is that a particular variable (genotype) is a free text categorical variable so if you misspelled a diagnosis or accidentally added an extra space it counts that as a new level of that variable.
To paint the picture of how annoying this is and why we need to change it for the new database: This variable has approximately 35 levels that we are interested in but the current database has about 117 unique values for that variable.
I want to consolidate the old diagnoses into the new format (which is a categorical variable that takes a value between 1-13, where a value of 12 allows for a drop-down box to free text if in Redcap). Is there a way to quicken the process of changing the levels so that they don't repeat with weird (and useless) additions or am I about to just write a massive "if-then" statement?
Ok, I was worried that was probably the better, albeit not quickest, solution. It will be a one-time process (in theory) since once the new database goes live your only choice will be to select a value between 1-13 corresponding to that diagnosis.
When you say fuzzy matching, are you talking about the COMPGED function?
Thanks for your help
If this is a one time process and only about mapping 117 values to 35 then doing this "manually" in code appears as the quickes solution to me.
Instead of IF/THEN you could create and use a SAS format.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.