04-03-2014 10:31 AM
I've always been interested in the various coding styles and practices of my fellow SAS programmers. I have a question today:
For a project, do you prefer to keep all of the coding in one document (not including macros)? Do you divide the program up? If you divide it up, how do you decide how to separate it?
As for me, I tend to keep all of my program in one place. That was how I was taught, but I really want to understand what other coders think is efficient or a better way of doing things.
04-03-2014 10:43 AM
My personal choices usually come down to the scale of a project. Often if it uses existing data then it will fit into one program.
Projects that involve bringing in data from one or more sources usually end up with:
A program to read each data source
Validate and/or clean each data source
Recoding and transformation for analysis
Reporting program(s). Report programs will often be separate for different types of reports such as maps, rankings, summary tables, regressions etc.
Creation of external data files if needed
04-03-2014 11:02 AM
The first three (the 'build' process) might be 3 separate for each data source or one combined. Rather depends on the work to do.
For repetitive, production, runs, they are combined with either a %include or run as separate programs with a batch file wrapper. The advantage of separate programs is that a failure of one will not flush the rest, so ongoing maintenance is facilitated. [Unfortunately, we are not always notified of changes to data sources.]
Reports often end up growing organically, as the client often cannot articulate their needs well enough for complete specs at the beginning. They can be one or multiple programs.
04-03-2014 01:23 PM
I have used a driver program that was actually written using SAS/AF but that was with a project that ran daily where a main part was to detect instrument malfunctions in remote sites to quickly schedule repair or maintenance work. And since the instrumentation changed moderately often had checks to see when the incoming data format differed from the previous so we could set the change date/time in the read programs ...
My current work tends to work with longer intervals between data and changes at frequent intervals and mostly ad hoc reports as requested so I don't even attempt to automate much.
04-08-2014 02:25 AM
I decide on a maintenance point of view to seperate the code over multiple files.
also a point to consider is how you shedule your programs: if you are using 3 sheduled jobs (example: extraction in shedule job 1, transformation in shedule 2 and reporting / data upload in shedule job 3) makes it easier to re-run certain jobs when a problem has occur. (example: problem during transformation, your extraction process does not need to re-run, so your maintenance window can be smaller).
04-08-2014 04:35 AM
I would concur with all the points above. It depends greatly on scale of project, number of people working on the project, and how things operate within the project. There are positives and negatives to each scenario. One thing to bear in mind is that nowadays there is a big shift towards programs which are easy to read and maintain rather than compressed, complicated code due the obvious reasons that storage is no longer such an issue and speed of execution for most people also not too much of a problem. However coming to a project 6 months after all the developers have left and then trying to work out what that funky bit of code does can be resource intensive, and can also be quite complex to validate.