If your data volume is small enough, what I would do is:
* Get a distinct list of your subjects (either proc sort nodupkey or proc sql select distinct)
* Review the distinct list of data.
* For misspelled data, create a column such as correct_spelling with the correct spelling.
You've now just created your custom dictionary.
With each new data feed, append to your custom dictionary.
Rejoin back to your data, assigning correct_spelling to subject where correct_spelling is not missing.
The compbl function can help with multiple spaces:
data test;
have='This is a string with multiple spaces';
want=compbl(have);
run;
... View more