Coming from my background in computer programming to the world of SAS® has yielded me some interesting insights and revelations. I’ve come to learn that many people who program in SAS are consultants or work individually, sometimes as the sole maintainer of their code. Since SAS code is designed for tasks like data processing and analytics, SAS developers working on teams may use different strategies for collaboration than those used in traditional software engineering.
Whether a programmer works individually, on a team, or on a project basis (delivering code and moving on to the next project), there are a number of best practices from traditional software engineering that can be leveraged to improve their SAS code. These practices make it easier to read, maintain, and understand/remember why the code is written the way it is.
I will give a brief overview of some of these best practices. For more best practices, a deeper dive into the concepts I mention here, and code samples, see the full paper and presentation, linked at the end of this post.
The idea here is to focus on the future. Your code will probably be modified by other people, whether they are taking over responsibility for a project or are new to the team. Also consider yourself to be a future user. How many of us have revisited code we wrote a few months or years later, only to find that we can’t understand why we did things that way?
I suggest developing a set of coding standards and following them. Whether you are on a team or working alone, standards can help unify your code and make it easier to read and reuse. Also, consistency across projects (and even programming languages, if you use more than one) makes it easy to switch.
In SAS, standards include descriptive variable names following a similar format, using whitespace consistently and effectively, and using modularity (i.e., macro variables and functions). You can have standards for anything, including date formats (e.g., using the ISO 8601 format), dataset names, and a header template for your code files.
I can’t stress the importance of version control enough. All programmers should be using version control. Rather than saving copies of your code (e.g., code_v2.1.sas, code_old.sas, code_2017-08-31.sas, etc.), version control software provides a way to track changes over time to a set of code, and is scalable. In other words, version control software is a tool that makes it easier to work with others. Anyone involved can see the changes you made, and quickly get the latest versions as they are made.
Version control has saved me at least several days of work, and that’s just in the year and a half I’ve been with the organization. A couple of times, I’ve accidentally overwritten my code, or forgot about some change I made a while back. By looking at my commit history, I was able to find the relevant previous version and restore it, so I didn’t have to do the work again. This only worked because I incorporate version control as a fundamental part of my development practice.
You may have heard of GitHub, which is a company that hosts many repositories in the cloud. These are often open-source projects (i.e., everyone can see your code, and potentially contribute themselves). They also have plans that allow for private repositories, as do other software packages like TortoiseGit and TortoiseSVN.
Version control was designed with collaboration in mind. It works really well, and doesn't require a big time committment to learn the basics that are used for 99% of the workflow.
Good code can be self-documenting, meaning it’s easy to follow and understand. Still, comments take that farther and allow you to explain to other humans what the code is or should be doing. I strongly advise using comments wisely. Don't write comments about code that is obvious or easy to understand. Use comments as a way to supplement information not so obvious in the code itself. Is your code doing something clever or complex? Put helpful comments to remind yourself (and guide others) to it's purpose.
Rather than comment-out old code, let version control track changes you’ve made. You can always go back and see what was changed when, and by whom. Your documentation can mention specific commit(s) in which changes were made.
I hope this was a quick read that piques your interest about the concepts and practices mentioned. This blog post is based on my paper "Code Like It Matters: Writing Code That's Readable and Shareable" that was presented at the Midwest SAS Users Group conference in October 2017, and the Minnesota SAS Users Group meeting in June 2017. For the full paper and slides, you can view the paper’s webpage.