Hi all,
I am importing data over from an excel sheet into Sas EG. I have to create a new data set from that data where there are no mispellings, extra spaces, etc. I have a Character Variable, Subject, where some of it's observations are mispelled and i need to fix them. please help with the correct data step procedure.
Not a trivial exercise and made harder by not providing any example data.
What types of values do you have? People names (misspelling almost impossible to address), company names, City names, drugs, machinery ?
Do you know what the correct spellings should be for things? Are the misspellings in the middle of other text such as the middle of a sentence?
If you have single values that is misspelled such as "BIM" instead of "IBM" then there are tests that can be done to determine if the word is similar to another word and if similar enough you might replace it with a value from a known list. But we would kind of need to have some examples of the misspelled value and the correct value.
Of if feeling brave Excel does have a spell checker with autocorrect built in. Try that before importing.
Here is an alternate approach using a format that will do the same thing that @Reeza's example does.
proc format library=work; value $rename 'Alfred' = 'Alfie' 'John' = 'Johnny' ; run; data example; set sashelp.class; name = put(name,$rename.); run;
The format used this way can seriously reduce the number of IF/THEN/ELSE or other logic statements. And if you are doing this repeatedly with multiple data sets such that new errors appear they can be easily added to the format.
Suppose I have developed a format that works for your example for accounting which could look like:
proc format library=work; value $rename 'acconting' = 'accounting' ; run;
and then the next time a file needs to be processed you find a value of 'acounting'. Just add it to the format:
proc format library=work; value $rename 'acconting', 'acounting' = 'accounting' ; run;
No additional code changes need be made.
This format approach works because any value that does not appear on the left of an = sign in the value block is not reformatted. So the correct values are left alone.
PS: From the description I would suspect that you might be changing "mth" to "math" not "match".
And you can get a reasonable start on the values that go on the left by using code like:
Proc sql; select distinct subject from yourdataset ; quit;
Which will create a list in the results window of all the values of subject that you can copy and paste into the editor for proc format. Remove the correct spellings. The value lists in proc format do not have to be all on one line so making a value statement like this should be moderately easy.
proc format library=work; value $rename 'acconting' , 'acounting' = 'accounting' 'matth' , 'mth' = 'math' ; run;
These are usually brute force unfortunately, unless you have systematic rules that need to be applied.
data clean;
set sashelp.class;
if name = 'Alfred' then name = 'Alfie';
else if name = 'John' then name = 'Johnny';
run;
@dkeuch816 wrote:
Hi all,
I am importing data over from an excel sheet into Sas EG. I have to create a new data set from that data where there are no mispellings, extra spaces, etc. I have a Character Variable, Subject, where some of it's observations are mispelled and i need to fix them. please help with the correct data step procedure.
If your data volume is small enough, what I would do is:
* Get a distinct list of your subjects (either proc sort nodupkey or proc sql select distinct)
* Review the distinct list of data.
* For misspelled data, create a column such as correct_spelling with the correct spelling.
You've now just created your custom dictionary.
With each new data feed, append to your custom dictionary.
Rejoin back to your data, assigning correct_spelling to subject where correct_spelling is not missing.
The compbl function can help with multiple spaces:
data test;
have='This is a string with multiple spaces';
want=compbl(have);
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.