Re: Survey - what's your EG style?

mftuchman · Posted 07-09-2009 04:44 PM

How do you like to work in EG? A single code node? A few code nodes in sequence? Lots of queries spread out in a flow chart? What internal guidelines do you follow to make your work manageable for yourself or for others?

I'm putting together a list of style pointers for nice EG. Here are some that I use based on my own experience. Please don't interpret them as rigid commandments. The purpose of this is to gather feedback. Particularly good may ideas will be mentioned explicity in a NESUG poster (with author's permission, of course).

* One Idea per node, particularly code nodes
* Minimize crossings of arrows
* I Never (OK, rarely) terminate a branch with a data set. Branches should be terminated by reports of some kind.
* I Always rename queries. I rarely know two weeks later what "query1" is supposed to do.

That's just a sample, and I bet you have even better ideas.

deleted_user · Posted 07-10-2009 12:56 AM

I am all over the map. I use EG for data exploration primarily. If I am just testing out some ideas, checking the data, I may not even save the project.

The first thing I do, ALWAYS is the characterize data, to check the data quality. If your data are no good, everything else is a waste of time.

Anything I plan to save though, I absolutely change the query name because I do so many analyses I would never remember what any are.

Yes, I end everything with a report.

Doc_Duke · Posted 07-10-2009 03:05 PM

I do NOT always end in a report.

I will often organize the work so that I have a process flow devoted to data transformations (ETL), so the goal is to create one or more SAS datasets.

The other process flows are devoted to particular sets of analyses on a set of data; those end in reports.

prholland · Posted 07-10-2009 03:36 PM

My initial development is based on an network of EG-local code nodes, which are self-sufficient, so I could run them individually, if necessary. However, the network forces me to run them in a sensible order. I normally limit the number of programs on each Process Flow to less than a dozen, and keep the output names unique, otherwise the flow becomes unmanageable.

Working in a pharms environment generally means that I have to save these programs to a server at some point, so the EG environment, with its program links, is then used to help me run them in their dependency order, but also include document links to help me find things again in the future.

Full details of this can be found in my SAS book, "Power User's Guide to SAS Programming", at www.hollandnumerics.com, and my paper "Running Clinical Trials Programs with Enterprise Guide", at www.hollandnumerics.com/SASPAPER.HTM.

...........Phil

mftuchman · Posted 07-11-2009 11:59 PM

Thanks for all the replies so far! To doc@duke - in subsequent flows after the initial data set is built, how would you characterize your style. Message was edited by: mftuchman

Longhow · Posted 07-11-2009 06:32 AM

I have been using EG for two main purposes. Creating a "final" analysis set for modeling from different sources, and for adhoc "on the fly" data investigations.

Within one project I structure my work on the different process flows.

Sheet one usually contains: import tasks/nodes from different sources
Sheet two contains the merges and manipulation steps.
Sheet three contains the exploration task to check/analyze the data.

Starting with SAS/BASE like most users, I think the point is not to think like a base user. I have seen people use EG with one or two icons in the whole project: sas code nodes where everything is done. To benefit from EG as much as possible I try not to write any sas code if possible and use the build-in tasks as much as possible.

I have learned myself to rename the tasks into something more meaningfull than the default name..

I have found EG very usefull in creating and maintaining sas code. It is a shame that not everybody (or every company) is using this nice sas software.

mjw149 · Posted 07-11-2009 04:44 PM

My style coming from less of a sas background is to keep everything to one main sas program, but use the nodes, arrows, icons, etc. the 'filter and query' stuff for verification and exploration. So that stuff I leave in there for future reference, but always the main program on top.

And I do use flows for import/export type stuff. And I love to leave notes.

I separate out to different tabs when there is old code or experimental code, and that way all the exploratory and verification stuff is kept together with the relevant data.

mftuchman · Posted 07-12-2009 12:03 AM

I also find that after a while, I don't want to wait for the yellow and greens to flash. When I feel comfortable with a process, I sometimes reduce it to a single code node. I wish there were a less manual way to do this. I suspect your ETL methodology is related to what I want to do.

Longhow · Posted 07-12-2009 10:50 AM

> I also find that after a while, I don't want to wait
> for the yellow and greens to flash. When I feel
> comfortable with a process, I sometimes reduce it to
> a single code node. I wish there were a less manual
> way to do this. I suspect your ETL methodology is
> related to what I want to do.

Why would you reduce it to a single code node? You would lose the view on "what's going on". The network diagram visualizes the step by step apporach of the whole process. This is one of the things I like, EG almost forces you to breakdown the process in steps. In this way, the flow is easier to: maintain, explain to others, extendable.

mftuchman · Posted 07-12-2009 12:05 AM

I like the idea of one flow for building, one for analysis, one for exploration,and one for business reporting.

Thanks, all for taking the time. Keep them coming though. I'd like to hear how you keep the process flows themselves manageable, as contrasted with EG habits in general.

mftuchman · Posted 07-12-2009 12:10 AM

I like the idea of one flow for building, one for analysis, one for exploration,and one for business reporting.

Thanks, all for taking the time. Keep them coming though. I'd like to hear how you keep the process flows themselves manageable, as contrasted with EG habits in general.

RichardH_sas · Posted 07-13-2009 11:33 AM

Based on lots of trial and error, I arrive at similar conclusions to a lot of folks on here.

*One process flow for data import and basic manipulation
*Sometimes a second process flow for more data stuff if there's more complex manipulation like summarization, joins, heavy data cleaning
*One or more process flows for reporting and analysis
*One process flow for data verification. This is a new one for me... when I run all those one-way frequencies/characterize data/query tasks to look at data values, I'm trying to get in the habit of immeadiately moving that stuff into a verification process flow. A mistake I tend to make is getting my ETL process flows clogged with a bunch of side-tasks designed to look at data values.
*Anything with a parameter gets a separate process flow. Usually, I use parameters to create little mini-applications, hence they get their own separate process flows.

I will often use tasks rather than write code these days unless the code has a specific advantage. There are times when I know I can do 5 different things in a single data step that would take several other tasks in succession to duplicate. I *always* use the right-click "link to" feature to make sure my code and the objects it generates stays in the process flow with everything else.

The one thing I haven't figured out an elegant solution for is set-up code like LIBNAME statements. If you're going to write any code in EG, having dedicated libraries rather than the EC00001 references keeps you from driving yourself insane. Right now, I put a code node at the top of every process flow that's named (all in capital letters) RUN ME FIRST!!!. Sometimes I still forget to run it, and then everything breaks. I'm toying with having a process flow dedicated just to the starting code, but I'd still have to remember to start by running that process flow before any other. Sigh. It'd be cool if EG had a "run this code when the project starts" setting. I do lots of different things in EG, so my libraries and data are changing all the time.

One last thought: the one thing I do not like EG for is writing SAS code for application development. Writing little bits of SAS code that can be used in conjunction with tasks works great. But if I'm going to write 500+ lines of code to create an application, I still find the SAS windowing environment more convenient for that. You can pull up a couple code windows there, keep your master code in one and keep all the junk code (PROC FREQs, stuff that didn't work, etc.) in the others. And no constant reminders "Would you like to replace the previous results?" when you rerun something. 🙂

Richard

Jay · Posted 07-13-2009 03:21 PM

>>>It'd be cool if EG had a "run this code when the project starts" setting

In EG4.2 Tools>>Options>>SAS Programs>> Submit SAS code when server is connected - - and paste your sas code in this window, this might accomplish what you want ...??

Jay

RichardH_sas · Posted 07-15-2009 07:01 AM

Thanks Jay! I tried that, but it's code that gets submitted for every project you're working with. I need preliminary code that's project specific... Tuesday I may be working with a SAS library of class enrollment data and Thursday I may be working with a SAS library of retail sale data. Things like libraries and macro variables I need to set specific to the project.

UCFAngel · Posted 07-23-2009 12:39 PM

When I need to create a process flow that runs in a particular sequence, I create an ordered list. When you create an ordered list (similar to a macro in MS Access) you select how you want the nodes ordered. This capability is found under Tools > Create Ordered List.

I also use the "linking" feature so my process stays in the order I want and I always rename my nodes and output data to something I can interpret later. I find that numbering the steps in the flow helps me when I select the order for the list.

Registration is open

SAS Training: Just a Click Away