BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ScoobySASEG
Fluorite | Level 6

In the following minimal example of declaring and using a hash object inside a data step, my understanding is that:

 

You have to include both "if 0 then set A;" and  "call missing(k);" , if you don't want SAS to complain with "ERROR: Undeclared key symbol for hash object at line 31 column 7."

 

Can someone please confirm my understanding? Or am I confusing two different concepts? Thanks in advance.

 

@PeterClemmensen Would highly appreciate your input since this post was a great help!

 

SAS Version: SAS Enterprise Guide 8.3 Update 8 (64-bit)

 

data _null_;
   if 0 then set A
   if _N_ = 1 then do;
      declare hash h(dataset:'A');
      h.defineKey('k');
      call missing(k);
   end;

set B;
if h.find()=0 then put "Key exists in A: " k=; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

No.  Both are not required.  And for that particular example neither is needed since the variable K used in the HASH object must already exist in dataset B for the code to make any sense.

81   data a b ;
82     do k=1 to 2; output a; end;
83     do k=1 to 4; output b; end;
84     stop;
85   run;

NOTE: The data set WORK.A has 2 observations and 1 variables.
NOTE: The data set WORK.B has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


86
87   data _null_;
88     if _N_ = 1 then do;
89       declare hash h(dataset:'A');
90       h.defineKey('k');
91       h.defineDone();
92     end;
93
94     set B;
95     if h.find()=0 then put "Key exists in A: " k=;
96     else put "Key does NOT exist in A: " k=;
97   run;

NOTE: There were 2 observations read from the data set WORK.A.
Key exists in A: k=1
Key exists in A: k=2
Key does NOT exist in A: k=3
Key does NOT exist in A: k=4
NOTE: There were 4 observations read from the data set WORK.B.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The purpose of the unexecuted SET statement is to let the data step compiler see the variables you are using from that dataset.  That is needed because any variables you are referencing in the DEFINEKEY() and DEFINEDATA() method calls are "invisible" to the data step compiler.  They just look like sting literals.  So if those variables are never defined some where in the data step then you get a RUN TIME error when the hash object is created.

4    data test;
5      set sashelp.class(obs=1);
6      declare hash h();
7      h.definekey('fred');
8      h.definedone();
9    run;

ERROR: Undeclared key symbol fred for hash object at line 8 column 3.
ERROR: DATA STEP Component Object failure.
       Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set SASHELP.CLASS.
WARNING: The data set WORK.TEST may be incomplete.  When this step was
         stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

You don't need to use the SET statement to define the variables. You could actual define the variables using LENGTH or ATTRIB statements.  

 

One use of the CALL MISSING() statement is to avoid a message about a variable not be initiated.  There are other ways to do that.  You could initiate the variable with an assignment statement instead.  Or you could include the variable in an array definition.  The use of CALL MISSING() just make it easier.

 

But another use of CALL MISSING() statement is important when using this type of HASH() lookup.  And that is using CALL MISSING() for its original purpose, to set variable to missing values.  You should use that when the FIND() method did not find an entry.  That is especially important when using the unexecuted SET statement to define the variable because variables that appear in source datasets are RETAINED (that is not set to missing at start of each iteration of the dataset loop).  So to prevent lookup values from being carried forward it is important to clear those values when the lookup fails.

 

Try this example:

data newclass ;
  length name $8 ;
  do name='Alfred','Sam';
    output;
  end;
run;

data want;
  if 0 then set sashelp.class(keep=name age);
  if _N_ = 1 then do;
    declare hash h(dataset:'sashelp.class');
    h.defineKey('name');
    h.definedata('age');
    h.defineDone();
  end;

  set newclass;
  put name= @;
  if h.find()=0 then put age=;
  else do;
    call missing(age);
    put age= " Could not find age";
  end;
run;

proc print;
run;

Now try running it without the CALL MISSING() statement.  What value of AGE did you get for NAME="Sam"?

Now add remove the IF 0 ... statement and add a LENGTH AGE 8; statement after the SET NEWCLASS statement.  What value do you get now?

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Your code won't work because of a missing semicolon 😉

 

The purpose of

if 0 then set a;

is to make the data step compiler fetch the dataset's metadata and include the variables in the PDV; this statement automates the definition of the variables for the rest of the DATA step. You therefore do not need a LENGTH/FORMAT statement.

CALL MISSING, OTOH, avoids the NOTE about uninitialized variables.

 

Maxim 4: try your code with and without these statements to see what happens.

ScoobySASEG
Fluorite | Level 6

Thank you so much, @Kurt_Bremser !!!

 

I will certainly take a look at those maxiums and try to apply them.

 

And yes! the forsaken semi colon 🙂 bane of my existence 

Tom
Super User Tom
Super User

No.  Both are not required.  And for that particular example neither is needed since the variable K used in the HASH object must already exist in dataset B for the code to make any sense.

81   data a b ;
82     do k=1 to 2; output a; end;
83     do k=1 to 4; output b; end;
84     stop;
85   run;

NOTE: The data set WORK.A has 2 observations and 1 variables.
NOTE: The data set WORK.B has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


86
87   data _null_;
88     if _N_ = 1 then do;
89       declare hash h(dataset:'A');
90       h.defineKey('k');
91       h.defineDone();
92     end;
93
94     set B;
95     if h.find()=0 then put "Key exists in A: " k=;
96     else put "Key does NOT exist in A: " k=;
97   run;

NOTE: There were 2 observations read from the data set WORK.A.
Key exists in A: k=1
Key exists in A: k=2
Key does NOT exist in A: k=3
Key does NOT exist in A: k=4
NOTE: There were 4 observations read from the data set WORK.B.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The purpose of the unexecuted SET statement is to let the data step compiler see the variables you are using from that dataset.  That is needed because any variables you are referencing in the DEFINEKEY() and DEFINEDATA() method calls are "invisible" to the data step compiler.  They just look like sting literals.  So if those variables are never defined some where in the data step then you get a RUN TIME error when the hash object is created.

4    data test;
5      set sashelp.class(obs=1);
6      declare hash h();
7      h.definekey('fred');
8      h.definedone();
9    run;

ERROR: Undeclared key symbol fred for hash object at line 8 column 3.
ERROR: DATA STEP Component Object failure.
       Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set SASHELP.CLASS.
WARNING: The data set WORK.TEST may be incomplete.  When this step was
         stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

You don't need to use the SET statement to define the variables. You could actual define the variables using LENGTH or ATTRIB statements.  

 

One use of the CALL MISSING() statement is to avoid a message about a variable not be initiated.  There are other ways to do that.  You could initiate the variable with an assignment statement instead.  Or you could include the variable in an array definition.  The use of CALL MISSING() just make it easier.

 

But another use of CALL MISSING() statement is important when using this type of HASH() lookup.  And that is using CALL MISSING() for its original purpose, to set variable to missing values.  You should use that when the FIND() method did not find an entry.  That is especially important when using the unexecuted SET statement to define the variable because variables that appear in source datasets are RETAINED (that is not set to missing at start of each iteration of the dataset loop).  So to prevent lookup values from being carried forward it is important to clear those values when the lookup fails.

 

Try this example:

data newclass ;
  length name $8 ;
  do name='Alfred','Sam';
    output;
  end;
run;

data want;
  if 0 then set sashelp.class(keep=name age);
  if _N_ = 1 then do;
    declare hash h(dataset:'sashelp.class');
    h.defineKey('name');
    h.definedata('age');
    h.defineDone();
  end;

  set newclass;
  put name= @;
  if h.find()=0 then put age=;
  else do;
    call missing(age);
    put age= " Could not find age";
  end;
run;

proc print;
run;

Now try running it without the CALL MISSING() statement.  What value of AGE did you get for NAME="Sam"?

Now add remove the IF 0 ... statement and add a LENGTH AGE 8; statement after the SET NEWCLASS statement.  What value do you get now?

ScoobySASEG
Fluorite | Level 6
Thank you so much for this detailed reply!! As a new sas user I am amazed at how helpful this community is !!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 379 views
  • 7 likes
  • 3 in conversation