Hello,
I have a long list of entries for a categorical variable classified into levels in a RTF document.
What is the most efficient way to reclassify each observation into a new numeric variable? There is a lot of variation in punctuation between apparently similar observations, which I've classified manually.
Sample code, which works, but is untenable for hundreds of individual iterations of each level...
data new;
set old;
length fruit_num 8.;
if fruit = 'Apple' or fruit = 'Apples' or fruit = 'apple' then fruit_num = 1;
if fruit = 'Orange' or fruit = 'Oranges' or fruit = ' organge' then fruit_num = 2;
else fruit_num = .;
run;