10-11-2017 08:27 PM
I have a string variable called degree that contains student's majors. My problem is that the majors were written in so they are not uniform.
For example: for history majors, some say "history" and some say "mod-hist". Some say "stat" while some say "statistics". Some are capitalized, some are not.
I need to make all of the majors have a uniform label. I want all of the history majors to have "history" as their major. All of the stats majors to have "statistics" as their major. (etc).
What is the most simple way to do this?
I was trying to find somewhere to search for a substring such as "hist" and then replace that WHOLE string value (not just the substring) with "history" but I haven't found a way to do that.
10-11-2017 11:20 PM
I would start by making everything Proper or Upper Case.
Then remove all the obvious values that are already ok.
Then make your rules and recode things. It may mean a lot of IF/THEN statements.
Look at the character functions in the documentation (search SAS functions by category) to see useful functions such as COMPRESS, COMBL, FIND and INDEX.
10-12-2017 05:03 AM
I would be doing something similar to what @Reeza suggests:
1. compbl() and propcase() your source variable
2. create a table with distinct values
3. use these values to create a format (i.e. export the table to Excel, add your standardized labels, re-import the Excel and create a format out of it)
4. apply the format to the original values (with compl() and propcase() applied first) to standardize the strings.
10-12-2017 11:25 AM
You may have to very carefully consider the order of your processing and the actual desired target(s).
ExampleS: Art History,Secondary Education- History emphasis
Is Art History an Art degree or History or possibly Humanities? If it is Art then you would have to search and replace 'Art' before your "hist" search AND the replacement value could not contain the word History. Otherwise your other search would change it from Art to History.
And perhaps you have some majors that don't actually include the 'key' value.
Is Business Statistics a Business degree or a Statistics degree (or program or major or concentration or ??).
I might take a different approach then "if / then /else" by creating a list of all of the unique values and creating a format or even an Informat, to allow use of Just and Upcase options for comparison,
Need further help from the community? Please ask a new question.