06-02-2016 03:36 PM
I've been coding in mainframe and PC SAS for decades and made the transition from PC SAS to EG (currently EG 7.11) about 6 months ago when I started a new job. I have inherited a large egp (~150 objects) which is all in one process flow. The flow is currently about 3 screens wide and 3 screens tall and a nuisance to navigate. (It's big enough that zooming out to see all of it makes the icons unreadable.) I'm in the process of trying to tame it. I wondered how other experienced SAS programmers approach organizing complex tasks. It would be nice if SAS had a subprocess flow object so we could have a top-level flow which broke out into smaller ones...
- This mess is our development version. Individual programs are compiled to a macro library so end users can call them by a much simpler inteface egp.
- The author is a visual thinker with data science skills but not a lot of SAS experience. He prefers to have each data step or proc in a separate program so the process flow is analogous to a logical flow chart. I'm inclined to combine some of these nibbles into larger programs and comment them. Some things are done by query builder, so there is a limit as to how much I can combine things. (Folks in this shop seem a lot more comfortable using a query builder than coping with code exported from query builder.)
- How do folks find the balance between splitting into multiple process flows or not? One way to do this is to split it out into separate process flows, retaining the same level of granularity for ease of understanding the logic. The other approach is to combine into larger programs and thus simplify the one giant process flow. (A third approach would be to unembed the programs and have them included in both an overall process flow and a series of smaller process flows to forcibly create the subprocess-flow object which SAS does not provide. I am guessing this would cause unintended consequences and am not going to try it.) Since you cannot link an object on one process flow to an object on another, there is information loss inherent in splitting it up.
- With a project this complex, it is easy to accidentally save temporary changes and complex to figure out what you did. Do you use the version control in the egp to check the code in and out, or do you keep the code unembedded and manage it in the OS/directory structure?
I know these questions are all a matter of "feel" but I see no discussions of them on the web, and I would be interested to hear how others approch this. As an experienced SAS programmer, do you just organize your project in big programs but use EG because it's cheaper, or do you use EG to develop a different, more visual style of code development?
06-02-2016 05:23 PM
First of all, I'm delighted to see the positive attitude you're taking to transitioning to EG. I've met a lot of "old school" SAS programmers who are much less willing to give it a try.
I've encountered some of the same questions, but not at as large a scale. To be honest, I do all of the things you suggest, pretty much on a case-by-case basis, with the objective of making the code base as easy-to-understand and maintainable as possible.
One thing I REALLY don't like is having the process flow go off the screen to the right...to me, that's absolutely the time to start refactoring.
The only time I go down to the level of a single data step or proc step in a node is if it's either a major data step, or a significant proc. Otherwise, I group data steps and procs into a node based on what I feel is a "functional" unit.
I really agree with having non-programmer types use queries, rather than trying to write SQL code. I've seen too many people get bitten by SQL.
I haven't used version control, so no thoughts.
You haven't mentioned partitioning into multiple EG projects. That's another option that I frequently use.
I think you hit the nail on the head, when you say that these questions are a matter of "feel". As long as we make good efforts to organize things, I doubt there's one "right" answer to any code base.
I'm really glad you posted this, it's a fascinating topic. I think that EG brings new options for organizing code that we're just beginning to exploit. I hope lots of people join in this discussion!
06-02-2016 08:39 PM
Thanks @TomKari for replying and for the mention. A while ago I posted some tips for organizing your EG projects, and I think they are still as relevant as ever. @KHaavik - check them out and let me know what you think.
Two more tips:
06-02-2016 08:50 PM
Thanks for your reply. I read your blog article when I first started this process and still refer to it. It was one of the few useful resources I found.
I am indeed using stickies. I see learning the best way to use them as an extension of the question of how granular to make your code -- comments go in code and stickies comment the process flow. My current draft of this giant mess is split into several process flows with a sticky in lieu of code header at the start of each one. Resized to be instanty readable they are a great way of making it self-documenting.
I'd still love to have a subprocess object some day if SAS is willing.
06-03-2016 04:22 AM
A very interesting discussion. I'd like to response to the last question in the original post as no one has to date:
"As an experienced SAS programmer, do you just organize your project in big programs but use EG because it's cheaper, or do you use EG to develop a different, more visual style of code development?"
The answer to this I think is "it depends". Doing your EG applications in process flows makes more sense if these are personal projects for your team, work group or department. In this situation changing, supporting and maintaining the apps is more of a personal reponsibility and is handled in an informal way. These apps can be easily shared and run by other users and EG prompts can allow users to produce their own results. It is important to note that EG projects can only be run or scheduled on Windows PCs.
On the other hand, if your EG applications are for business-critical processes, require collaborative development by more than one person, need formal change control, are required to run in a server-based Production environment then I suggest that programs coded in EG are the way to go. You can still use process flows for the code development, but then you can develop suites of programs that can be linked using techniques like %INCLUDE and autocall macros so they mimic linked process flows. It is much easier to version control code, tracking all changes, than it is with process flows. When it comes to deploying apps to Production where they would be typically scheduled as batch jobs - this can only be done with coded programs.
06-14-2016 11:19 AM
Hey @TomKari, yes, we are listening! We too would love for EG to support sub-process flows. It is on our list of potential new features, but hasn't bubbled up yet. (Fyi... SAS Studio's process flow perspective does have the concept of nested process flows.)
06-03-2016 06:14 AM
I also have to answer - it depends .
But to give some guidelines it's necessary to under stand what the project/programs do, why and for whom.
If the main task it data management type of processing, in a centralized environment, EG is a no-no.