EG: Program Analysis, PROC SORTs, use of SCAPROC output

PhilC · ‎02-04-2020

With regard to the ability for Enterprise guide to use proc SCAPROC to analyze a program file to a process flow (under the Node menu > "Analyze program" > "Analyze program flow"), please develop a way for the results to be interpreted such that if 1) a section of code produces one named dataset and 2) that code is then immediately followed by one or more proc sorts with output named the same as the code in number 1), then, when building the process flow, EG should create only one node and then include the following: the code from 1) and all sets of code from the PROC SORTs recognized in 2).

Example illustration of resulting process flow which is desired:

The current behavior of Enterprise Guide is to treat the one or more PROC SORTS as multiple nodes. If the code that is being analyzed is full of such SORT procedures, then the resulting process flow will be littered with such structures. I argue that such structures are noise which distracts the user from the more meaningful structures. Secondly, after the creation of the process flow the information about the order in which each node is run is not clear visually due to the loopbacks necessary in the program.

Example illustration of current behavior:

Example code that is being analyzed:

data cars_;
  set sashelp.cars (Keep=Make MSRP);
proc sort;
  by Make MSRP;
proc sort nodupkey;
  by Make;
run;

I believe there is a good case for adding this exception to the "program analysis for process flow" algorithm because the use of PROC SORT is so common.

Thanks

ballardw · ‎02-04-2020

Since Proc Sort changes the content of the data I am not sure that everyone will agree that the sort steps should be removed from the flow.

For one thing it might help someone realize that one or more sorts is not actually needed (resorting by same variables) or that two sorts could be combined (adding one or more levels to an earlier sort).

Especially with your example of the second sort removing data a lot of data. Someone may find that looking at the sorts is why some result isn't as expected.

PhilC · ‎02-04-2020

@ballardw , If that were the case then wouldn't we want a separate icon for the same dataset, representing the many times it was touched, which can also depict the order that the nodes are supposed to be ordered -- otherwise its quite vague what is happening, to me.

When I created this, I copied the nodes into another process flow. The sort nodes switch around on me, and I couldn't tell the difference unless I looked at the code itself in the node.

PhilC · ‎02-04-2020

And again, I'm not trying to change this, I'm looking for an option to switch on this other behavior.