BookmarkSubscribeRSS Feed

The SET Statement's Compile Time Functions

Started ‎08-21-2018 by
Modified ‎08-21-2018 by
Views 4,135

In our book Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study @hashman and I (@DonH) use a number of very useful SAS programming techniques that may not be well known or commonly used. These techniques have particular relevance when using the SAS hash object, but they have much broader applicability and can be used to address any number of issues. While the examples here focus on using a SET statement to define hash object variables to the DATA step and PDV (Program Data Vector), the concepts apply to any SAS DATA step program. So even if you are not interested in the SAS hash object, the discussion here presents alternatives for how to use the SET statement as a compile time function.

 

The SET statement has a dual role in a SAS DATA Step:

  1. At compile time, SAS looks at the directory for the data set(s) in order to define the variables found in the data set to the Program Data Vector (PDV).

  2. At execution time, it reads the next observation from the name data set(s).

This construct was first referenced in The SAS Supervisor (a tutorials presentation at that SAS Conference in 1983).

 

There are a number of ways to include a SET statements in a DATA step so that it is never executed, but still serves its compile time function.

 

Typically such compile-time only SET statements should be included at the top of the DATA step so the variables in the data set are defined to the PDV using the attributes from the data set. A common technique to accomplish this is to use a conditionally executed SET statement with a condition that is never true. For example:

 

if 0 then set my-data-set-name;

 

Another is to include a SET statement after a STOP statement at the end of the DATA step, e.g.,

 

stop;
set my-data-set-name;

 

In the book both of these techniques are used. A primary reason for using a conditionally executed SET (i.e., a SET statement in the THEN clause of the IF statement, regardless of whether it is ever executed) at the top of the DATA step vs. a SET statement at the end of the DATA step is driven by how a SAS hash object is defined. Consider the following DATA step (and LOG notes) which defines a hash object and loads selected columns from the SASHELP.CLASS data set.

 

92   data _null_;
93    dcl hash class(dataset:"sashelp.class");
94    class.defineKey("Name");
95    class.defineData("Name","Height","Weight");
96    class.defineDone();
97   run;

 

ERROR: Undeclared key symbol Name for hash object at line 96 column 2.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

 

The second line of the ERROR messages says the problem occurred at execution time. The error messages points to line 96 because it is the defineDone method call that detects the absence of the host PDV counterparts for the variable names specified by defineKey and defineData. All three of these method calls happen at execution time. That suggests that as long as the characteristics for Name are defined to the DATA step before execution time, we should be OK. The SAS Logs for the following two DATA steps (one that uses a conditionally executed SET at the top of the DATA step; and one that uses a SET statement after a STOP statement)  confirms that. Both approaches work as the following SAS Log snippets illustrate.

 

98   data _null_;
99    if 0 then set sashelp.class;
100   dcl hash class(dataset:"sashelp.class");
101   class.defineKey("Name");
102   class.defineData("Name","Height","Weight");
103   class.defineDone();
104  run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

 


105  data _null_;
106   dcl hash class(dataset:"sashelp.class");
107   class.defineKey("Name");
108   class.defineData("Name","Height","Weight");
109   class.defineDone();
110   stop;
111   set sashelp.class;
112  run;


NOTE: There were 19 observations read from the data set SASHELP.CLASS.


So we decide to use the SET after the STOP technique once we augment our program to actually do something; in this case perform a search (AKA a table lookup):

 

113  %let who = Jane;
114  data _null_;
115   dcl hash class(dataset:"sashelp.class");
116   class.defineKey("Name");
117   class.defineData("Name","Height","Weight");
118   class.defineDone();
119   if class.find(key:"&who")=0 then put (Name Height Weight) (=);
120   stop;
121   set sashelp.class;
ERROR: Variable Name has been defined as both character and numeric.
122  run;

NOTE: The SAS System stopped processing this step because of errors.

 

The issue is that the first reference to the variable Name is now on line 119. That reference defines Name to be a numeric variable. So once the SET statement is encountered at compile time we have a conflict at compile time. The references to Name in the hash object method calls on lines 116 and 117 are character literals from the perspective of the compiler; they are not interpreted as variable names until execution time of the DATA step.

 

If we change the program to use a conditionally executed SET at the top of the data step, the program runs without any errors.

 

123  %let who = Jane;

124  data _null_;

125   if 0 then set sashelp.class;

126   dcl hash class(dataset:"sashelp.class");

127   class.defineKey("Name");

128   class.defineData("Name","Height","Weight");

129   class.defineDone();

130   if class.find(key:"&who")=0 then put (Name Height Weight) (=);

131  run;


NOTE: There were 19 observations read from the data set SASHELP.CLASS.

Name=Jane Height=59.8 Weight=84.5

 

So a general rule of thumb is to use a SET statement to define variables to the PDV when variables from that data set are to be loaded as hash object key or data items. If the variables are referenced somewhere in the DATA step program as variable names or as quoted strings in the hash object method calls placement of those SET statements matter:

 

  • If they are only referenced as quoted strings in hash object method calls, the placement of the SET statement (whether executed or not) does not matter.

  • Otherwise (they are referenced as variable names) a SET statement (whether executed or not) before such references should be used.

  • Remember that variable type and length are defined to the PDV based on their first reference in a DATA step (exclusive of DROP, KEEP and RETAIN statements).
Comments

Thanks @DonH for the clear explanations.

 

I'd like to mention that variables defined in a SET statement are automatically RETAINed. This applies to "compile-time only SET statements" as well, although it affects variable values at run-time. Sometimes it's necessary to reset those retained values by using assignment statements or CALL MISSING.

Agreed @FreelanceReinh. The implied retain and call missing is something planned for a another article. But good point that it should have been mentioned here.

Version history
Last update:
‎08-21-2018 10:54 AM
Updated by:
Contributors

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags