I've inherited a 3500 line SAS program that existed in a mainframe environment that I have refactor for a UNIX environment. The code is a mess to put it lightly. Are there any tips or tricks for dealing with this situation in the SAS world? Thank you for any help.
None of this will actuall "fix" anything but may provide a guideline for finding out what the program uses.
My first step on something like this is to apply a consistent code layout: indentation for instance especially on multilevel IF/then/else or Do /end blocks of code.
I like to have any external file references such as Libname and Filename statements at the top of the code then any custom format (Proc Format) at the top of my code even if not used until well into the body.
If there are any macros involved then have those together near the beginning of the program.
Within data step blocks it may be helpful to group all the declarative statements such as Informat, Format, Label, Attrib, Length, Array together near the start of a datastep. Add Label statements for as many variables as practical once you identify the purpose/use of a variable.
Look very closely for use of the :
Data thisdatasetname;
set thisdatasetname;
structure. It is not uncommon for some programers to such just to add a single or few variables. If that is the case move the calculation to a previous data step. I had one project using this single approach removed 15 datastep calls.
If you find long blocks of if/then statements to do recoding it may be worth time considering use of custom formats/informats to move the logic into a different procedure.
Do you actually have GOTO or LINK statements involved?
I also tend to segregate the data manipulation parts, Data step and Proc Sql as much as practical from any final report output.
I don't think there is any silver bullet for this type of programming. Just tidying up the layout so it becomes more readable would be where I would start. Then it will become clearer where further improvements can be made.
Also, i like to draw a diagram of all the data set names as they are used in the program and see which table is used where as the code progresses.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
