BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ranjeeta
Pyrite | Level 9

 

I have a dataset labelled as Final1 of patients with 84 records which has INST_NO i.e. the institution where they received some care

I want to compare the above dataset with another dataset (Inst_dsc) in which I have a list of Institution Nos labelled as INST_AT

How would i write a code to check if the INST_No in the 1st dataset is present in the 2nd dataset 

Basically want to flag the cases or records 

 

 

data match (Keep=INST_No INST_AT);

1227  length inst_AT 8 ;

1228  if _N_ then do;

1229  declare hash h (dataset:"work.inst_dsc");

1230  h.definekey ('INST_AT');

1231  call missing (INST_AT);

1232  end;

1233  set Final1;

1234  rc=h.find(INST_AT);

1235  if rc=0 then output;

1236  run;

 

ERROR: Uninitialized keys for hash object at line 1234 column 4.

ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 1 observations read from the data set Final1.

WARNING: The data set WORK.MATCH may be incomplete.  When this step was stopped there were 0 observations and 2 variables.

WARNING: Data set WORK.MATCH was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

 

 

1237  proc print ;

1238  run;

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read

rc=h.find(key:INST_No);

Or simplify the code by combining this with the subsequent IF-THEN statement:

if h.find(key:INST_No)=0 then output;

Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User
if _N_ then do;

means that you redefine the hash in every data step iteration (all numeric values except missing and zero are true).

Change to

if _N_ = 1 then do;

You are also missing a call to the definedata() (if you have data columns) and definedone() methods.

Ranjeeta
Pyrite | Level 9

Thankyou Appreciate it

FreelanceReinh
Jade | Level 19

I would like to point out that the primary mistake was the omission of definedone();. This caused the error messages in the log. Had it been inserted, the first error message would have changed to

ERROR: Unknown argument tag at line ... column 4. Tag must be key.

which might have helped you to find the solution. So, the most important response was actually @Kurt_Bremser's.

FreelanceReinh
Jade | Level 19

Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read

rc=h.find(key:INST_No);

Or simplify the code by combining this with the subsequent IF-THEN statement:

if h.find(key:INST_No)=0 then output;

Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).

hashman
Ammonite | Level 13

@Ranjeeta:

Your entire step can be condensed to this:

data match ;                           
  if _n_ = 1 then do ;                 
    dcl hash h (dataset:"inst_dsc") ;  
    h.definekey ("inst_at") ;          
    h.definedone () ;                  
  end ;                                
  set final1 (keep = inst_no inst_at) ;
  if h.check() = 0 ;                    
run ;                                  

Note:

  • You need neither LENGTH nor CALL MISSING, as the hash key is on file FINAL1, and parameter type matching (i.e. setting the hash host PDV variables) occurs when the compiler sees the SET reference to FINAL1. Most of the time, LENGTH and CALL MISSING aren't needed - they are just leftovers from the maladroitness in the documentation.
  • As @FreelanceReinh and @Kurt_Bremser have pointed out, _N_=1 is the condition you need to avoid (a) creating a new hash object instant at every observation read from FINAL1 and (b) loading the hash table at every observation.
  • Keep the only variables you need on input, not on output, to prevent polluting the PDV with extraneous stuff and degrading performance.
  • There's no need to use an assigned (i.e. RC=) method call because a call embedded in the IF statement prevents generating KEY NOT FOUND errors if there's no match.
  • Likewise, there's no need for the OUTPUT statement since you can just use the subsetting IF.
  • Since you don't need to retrieve any satellite information from the data portion of the hash table, you needn't use the FIND method. You're only checking if there's a match; so the CHECK method will suffice, not to mention that it's faster than FIND for  the obvious reason that it doesn't entail the extra burden of moving data from the hash table to the PDV host variables. 

Kind regards

Paul D.

hashman
Ammonite | Level 13

@Kurt_Bremser:

Yes, it can be coded this way. However, unfortunately, the way SAS designed the return codes is counter-intuitive from the standpoint of the normal SAS Boolean conventions. Hence, for a SAS programmer not quite familiar with the way the hash methods' return codes are set up, IF NOT H.FIND() is semantically perceived as "if key not found", while it's just the opposite.

 

This incongruence has caused quite a bit of costernation to my coauthor @DonH when he was loading himself with the SAS hash knowledge for the same reason as to me: Naturally, he and I would much rather prefer to have RC=1 if the call is successful and RC=0 otherwise. If that were the case, IF H.FIND() would mean "found" and IF NOT H.FIND() would mean "not found", and no confusion would arise.

 

More unfortunately, some hash object methods offer argument tags assigned a variable name whose value is set to 1 in case of success and to 0 in case of failure, just the way the things ought to be. For all these reasons, I prefer to code return codes explicitly rather than to turn them into Booleans, just to avoid disorientation both for myself and those reading my hash code.

 

Kind regards

Paul D.  

FreelanceReinh
Jade | Level 19

@hashman: Very clear explanations, as always. I may be wrong, but my understanding was that variable inst_at is in dataset inst_dsc, but not in dataset Final1. In this case I would insert

  inst_at=inst_no;

before the subsetting IF statement to support the simplifications you made to the code (and, of course, remove inst_at from the KEEP list).

hashman
Ammonite | Level 13

@FreelanceReinh:

You may very well be correct on that head. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3642 views
  • 15 likes
  • 4 in conversation