In our book Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study @hashman and I (@DonH) use a number of very useful SAS programming techniques that may not be well known or commonly used. These techniques have particular relevance when using the SAS hash object, but they have much broader applicability and can be used to address any number of issues. While the examples here focus on using a SET statement to define hash object variables to the DATA step and PDV (Program Data Vector), the concepts apply to any SAS DATA step program. So even if you are not interested in the SAS hash object, the discussion here presents alternatives for how to use the SET statement as a compile time function.
The SET statement has a dual role in a SAS DATA Step:
This construct was first referenced in The SAS Supervisor (a tutorials presentation at that SAS Conference in 1983).
There are a number of ways to include a SET statements in a DATA step so that it is never executed, but still serves its compile time function.
Typically such compile-time only SET statements should be included at the top of the DATA step so the variables in the data set are defined to the PDV using the attributes from the data set. A common technique to accomplish this is to use a conditionally executed SET statement with a condition that is never true. For example:
if 0 then set my-data-set-name;
Another is to include a SET statement after a STOP statement at the end of the DATA step, e.g.,
stop;
set my-data-set-name;
In the book both of these techniques are used. A primary reason for using a conditionally executed SET (i.e., a SET statement in the THEN clause of the IF statement, regardless of whether it is ever executed) at the top of the DATA step vs. a SET statement at the end of the DATA step is driven by how a SAS hash object is defined. Consider the following DATA step (and LOG notes) which defines a hash object and loads selected columns from the SASHELP.CLASS data set.
92 data _null_;
93 dcl hash class(dataset:"sashelp.class");
94 class.defineKey("Name");
95 class.defineData("Name","Height","Weight");
96 class.defineDone();
97 run;
ERROR: Undeclared key symbol Name for hash object at line 96 column 2.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
The second line of the ERROR messages says the problem occurred at execution time. The error messages points to line 96 because it is the defineDone method call that detects the absence of the host PDV counterparts for the variable names specified by defineKey and defineData. All three of these method calls happen at execution time. That suggests that as long as the characteristics for Name are defined to the DATA step before execution time, we should be OK. The SAS Logs for the following two DATA steps (one that uses a conditionally executed SET at the top of the DATA step; and one that uses a SET statement after a STOP statement) confirms that. Both approaches work as the following SAS Log snippets illustrate.
98 data _null_;
99 if 0 then set sashelp.class;
100 dcl hash class(dataset:"sashelp.class");
101 class.defineKey("Name");
102 class.defineData("Name","Height","Weight");
103 class.defineDone();
104 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
105 data _null_;
106 dcl hash class(dataset:"sashelp.class");
107 class.defineKey("Name");
108 class.defineData("Name","Height","Weight");
109 class.defineDone();
110 stop;
111 set sashelp.class;
112 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
So we decide to use the SET after the STOP technique once we augment our program to actually do something; in this case perform a search (AKA a table lookup):
113 %let who = Jane;
114 data _null_;
115 dcl hash class(dataset:"sashelp.class");
116 class.defineKey("Name");
117 class.defineData("Name","Height","Weight");
118 class.defineDone();
119 if class.find(key:"&who")=0 then put (Name Height Weight) (=);
120 stop;
121 set sashelp.class;
ERROR: Variable Name has been defined as both character and numeric.
122 run;
NOTE: The SAS System stopped processing this step because of errors.
The issue is that the first reference to the variable Name is now on line 119. That reference defines Name to be a numeric variable. So once the SET statement is encountered at compile time we have a conflict at compile time. The references to Name in the hash object method calls on lines 116 and 117 are character literals from the perspective of the compiler; they are not interpreted as variable names until execution time of the DATA step.
If we change the program to use a conditionally executed SET at the top of the data step, the program runs without any errors.
123 %let who = Jane;
124 data _null_;
125 if 0 then set sashelp.class;
126 dcl hash class(dataset:"sashelp.class");
127 class.defineKey("Name");
128 class.defineData("Name","Height","Weight");
129 class.defineDone();
130 if class.find(key:"&who")=0 then put (Name Height Weight) (=);
131 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
Name=Jane Height=59.8 Weight=84.5
So a general rule of thumb is to use a SET statement to define variables to the PDV when variables from that data set are to be loaded as hash object key or data items. If the variables are referenced somewhere in the DATA step program as variable names or as quoted strings in the hash object method calls placement of those SET statements matter:
Thanks @DonH for the clear explanations.
I'd like to mention that variables defined in a SET statement are automatically RETAINed. This applies to "compile-time only SET statements" as well, although it affects variable values at run-time. Sometimes it's necessary to reset those retained values by using assignment statements or CALL MISSING.
Agreed @FreelanceReinh. The implied retain and call missing is something planned for a another article. But good point that it should have been mentioned here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.