Hey all,
I currently attempting to create a new database that is just an improved version of one already in existence. One of the big problems (and massive eyesore) of the old dataset is that a particular variable (genotype) is a free text categorical variable so if you misspelled a diagnosis or accidentally added an extra space it counts that as a new level of that variable.
To paint the picture of how annoying this is and why we need to change it for the new database: This variable has approximately 35 levels that we are interested in but the current database has about 117 unique values for that variable.
I want to consolidate the old diagnoses into the new format (which is a categorical variable that takes a value between 1-13, where a value of 12 allows for a drop-down box to free text if in Redcap). Is there a way to quicken the process of changing the levels so that they don't repeat with weird (and useless) additions or am I about to just write a massive "if-then" statement?