In our book Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study, Paul Dorfman (@hashman) and I (@DonH) talked about how hash tables can do a lot more than just table lookup.
As the title indicates, we focused on its use in Business Intelligence tasks. Its use as a tool in generating documentation is our point here. We all know that programmers and developers just love to spend their time writing documentation (and gee do I wish there was a sarcasm font for that statement). One thing I have tried to do in my projects is to come up with ways to produce documentation using programs. For example, writing programs to pull off the header comment block for each program or macro.
Recently I was engaged on a project to help upgrade several applications from 9.1.3 to 94. To say they were not well designed or implemented would be an understatement. The code was scattered all over the place and as I started my review it was difficult to remember which of 10-20+ directories certain programs or macros were located in.
So the first thing I did was write a SAS program using wildcards to read in all the code lines from all the directories so I could search and generate some simple listings like an alphabetical list of each macro and where it was stored on the file system as well as a calling hierarchy for each application. Something like this what you see in the image at right.
Note that the data for this listing was obtained by subsetting the data for one of my client's real applications. The generated documents for each of those applications were upwards of 20 pages long - and that is even after I did some things to shorten the listings. For example, some utilities were called dozens and dozens of time from a single macro - thus I consolidated those into a single row called lineNumbers. In addition to subsetting, I also changed the names of all the programs to make them more generic.
The data that was used to generate this listing is simple. For each program or macro, there is a row for each program or macro that it calls. For example, SampleDriver.sas calls:
and processData calls:
and so on, and so on.
Typically this would require recursion, as you go down the hierarchy and then back up, you need to keep track of where you are. For example, backing up from GetColumnNames to upddateMetrics to (getMetricDefinitions-GetColumnsNames-calculateMetrics-getDataSourcInfo), you have to keep track of what lines have already been written to the hierarchy. This snippet of data is not that hard. Remember, however, that this is a simplified subset and the hierarchy is somewhat consistent (i.e., you don't go level 2-3-3-4-3-4-5-5-6-7-2 ......)
To say that recursive programming can be complex would be an understatement. That is where the hash object comes into play - it is an ideal tool for that since your table is in memory and there are methods to navigate through its rows.
My initial thought was to use a stack or a queue (covered in Chapter 10: The Hash Object as a Dynamic Data Structure) to do that. Before getting too far into that approach it occurred to me that if I had a table for each macro with what it called, it might be simpler to deal with the recursion. And that idea is covered in Chapter 9: Hash of Hashes - Looping Through SAS Hash Objects. Just create a hash table for each macro. Once I adopted that approach, the recursive nature of the problem remained. But I did not have to write recursive code. I simply needed to query the right hash table for each macro in the right order. So my recursion problem got replaced by iterating through my Hash of Hash (aka HoH) tables.
The code to do that follows.
%let root= C:\HCS\Projects\callingHierarchy; libname code "&root\Data"; %let Driver = SampleDriver; data _null_; if 0 then set code.callingHierarchy; length Calling Called $32;
Next we need to create our hash tables. They only need to be defined once; thus, they are contained inside a DO-END block whose statements are only executed on the first execution of the DATA step.
Our HoH hash table will contain the pointers to the hash object instances (or hash tables) for each program that calls other programs. Note that the variable Key must be scalar (i.e., numeric or character), and so we use the Program name. That is also needed, so that the FIND method could be used to retrieve the non-scalar values H and HITER for each individual hash table and the associated hash iterator from the table HoH.
Next, we create the input and output hash tables to contain our calling hierarchy. We also declare two corresponding hash iterator objects to cycle through the tables serially as many times as needed until our hierarchy is fully expanded (thus using an iterative process to circumvent recursion). As seen in the code below, the contents of the input (callingTreeIn) table is copied and expanded and added to the output (callingTreeOut) table. Note that we use a constant value for our Key variable (defined below) since we need both (a) a key value and (b) want the data-values stored in each hash table in the order they are added. This feature of hash table keys is part of the reason we can solve this problem by using iterative logic instead of recursive.
if _n_ = 1 then do; /* create a Hash of Hashes Table that contains a hash table for each program or macro that calls another program or macro */ dcl hash HoH(); HoH.defineKey ("Program"); HoH.defineData("h","hiter","Program"); HoH.defineDone(); dcl hash h; dcl hiter hiter;
dcl hash callingTreeIn(multidata:"y"); callingTreeIn.defineKey("Key"); callingTreeIn.defineData("Key","Level","Calling","lineNumbers","Called","Expanded"); callingTreeIn.defineDone(); dcl hiter callingTreeInIter("callingTreeIn");
dcl hash callingTreeOut(multidata:"y"); callingTreeOut.defineKey("Key"); callingTreeOut.defineData("Key","Level","Calling","lineNumbers","Called","Expanded"); callingTreeOut.defineDone(); dcl hiter callingTreeOutIter("callingTreeOut"); end; /* create a Hash of Hashes Table that contains a hash table for each program or macro that calls another program or macro */
Our next action is to create a hash table for each calling program, and add it to our Hash of Hash table HoH. A few notes about this section of code:
do until(last.Program); /* create the individual hash tables for each macro that calls other macros */ set code.callingHierarchy end = lr; by Program notsorted; if first.Program then do; /* create the table of what it calls */ h = _new_ hash(multidata:"y"); hiter = _new_ hiter("h"); h.defineKey("_n_"); h.defineData("Program","Calls","lineNumbers"); h.defineDone(); HoH.add(); end; /* create the table of what it calls */ h.add(); /* add the completed hash table to the Hash of Hashes table */ end; if not lr then return;
Now that once we have created a hash table for each calling program, we can begin to create our calling hierarchy. We do that by calling the ADD method to load a single row into the callingTreeIn table. ADD is called explicitly, i.e. using its argument tags KEY and DATA. Note that there is no calling Program and thus no line numbers for our driver, hence the corresponding argument tags are set to blank. By loading the starting point this way, we can start our calling hierarchy at any node in the complete hierarchy. For example, you might want to create a calling hierarchy for a specific entry point (e.g., Extract_DataSourceA in the above listing).
/* create the complete calling hierarchy once all the macro call hash tables have been built */ /* first load the initial driver program */ Key = 1; Level = 1; callingTreeIn.add(Key:Key ,Data:Key ,Data:Level ,Data:" " ,Data:" " ,Data:"&Driver..sas" ,Data:0 );
So, we are now ready to begin iterating to create our calling hierarchy by looping thru the rows in our input hash table and adding them to our output hash table, while, at the same time, adding rows (expanding) for programs called by our current calling Program.
/* loop until the hierarchy is fully expanded */ do until(not foundHoH or maxLevel gt 100); /* limit to 100 to prevent an infinite loop */ foundHoH = 0; /* intialize a counter so we know we have gotten to the bottom */ do while(callingTreeInIter.next()=0); /* Unconditionally add the record and mark it as expanded in this loop */ callingTreeOut.add(Key:Key ,Data:Key ,Data:Level ,Data:Calling ,Data:lineNumbers ,Data:Called ,Data:1 ); if HOH.find(Key:Called) = 0 and not Expanded then do; /* the called macro calls other macros - expand it */ do while(hiter.next()=0); Expanded = 0; /* don't hard-code in call, suppress uninitialized message */ callingTreeOut.add(Key:Key ,Data:Key ,Data:Level+1 ,Data:Program ,Data:lineNumbers ,Data:Calls ,Data:Expanded ); foundHoH + 1; /* we have calling macros so we need to loop again */ end; end; /* the called macro calls other macros - expand it */ end; link prepNextLevel; end; callingTreeIn.output(dataset:"&Driver(drop=Key Calling Expanded"); stop;
As we prepare to iterate again, the content of callingTreeOut table needs to be used as the input to the next iteration. Thus, the LINK routine prepNextLevel code simply:
We are now ready to begin our next iteration.
prepNextLevel: /* load the data from the previous level to the input hash table and initialize the output hash table to empty */ callingTreeIn.clear(); do while(callingTreeOutIter.next()=0); maxLevel = max(maxLevel,Level); callingTreeIn.add(); end; callingTreeOut.clear(); return; run;
Our final step is to produce a listing like what is included above. Note that we want programs indented based on their level in the calling hierarchy. So we create a variable called stub whose value is the contcatenation of the Level and Called variables That value is concatenated to a string of non-breakable spaces (the "A0"X values). Those allow for the value to be indented regarding of whether we create rtf, pdf or html output.
A variable (Ord) is also created to force PROC REPORT to preserve the input order in the output report. The PROC REPORT step produces the output seen above
options nodate orientation=portrait; ods rtf file="&root\Reports\&Driver..rtf"; data report; set &Driver; ord+1; stub = cats(repeat("A0"x,(level-1)*5),catx(' - ',level,Called)); run; proc report data=report style(report) = [rules=none frame=void cellpadding=2 cellspacing=2]; title "&Driver Calling Hierarchy"; %let dt = %sysfunc(date(),worddate.); %let tm = %sysfunc(time(),timeampm8.); footnote "Generated at &tm on &dt"; columns ord Stub lineNumbers; define ord / order noprint; define Stub / "Level and Program" display; define lineNumbers / "At Line Number(s)"; run; ods rtf close;
The above code, along with the sample input data and output will be made available as a GitHub Project.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.