I have a dataset labelled as Final1 of patients with 84 records which has INST_NO i.e. the institution where they received some care
I want to compare the above dataset with another dataset (Inst_dsc) in which I have a list of Institution Nos labelled as INST_AT
How would i write a code to check if the INST_No in the 1st dataset is present in the 2nd dataset
Basically want to flag the cases or records
data match (Keep=INST_No INST_AT);
1227 length inst_AT 8 ;
1228 if _N_ then do;
1229 declare hash h (dataset:"work.inst_dsc");
1230 h.definekey ('INST_AT');
1231 call missing (INST_AT);
1232 end;
1233 set Final1;
1234 rc=h.find(INST_AT);
1235 if rc=0 then output;
1236 run;
ERROR: Uninitialized keys for hash object at line 1234 column 4.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set Final1.
WARNING: The data set WORK.MATCH may be incomplete. When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.MATCH was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1237 proc print ;
1238 run;
Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read
rc=h.find(key:INST_No);
Or simplify the code by combining this with the subsequent IF-THEN statement:
if h.find(key:INST_No)=0 then output;
Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).
if _N_ then do;
means that you redefine the hash in every data step iteration (all numeric values except missing and zero are true).
Change to
if _N_ = 1 then do;
You are also missing a call to the definedata() (if you have data columns) and definedone() methods.
Thankyou Appreciate it
I would like to point out that the primary mistake was the omission of definedone();. This caused the error messages in the log. Had it been inserted, the first error message would have changed to
ERROR: Unknown argument tag at line ... column 4. Tag must be key.
which might have helped you to find the solution. So, the most important response was actually @Kurt_Bremser's.
Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read
rc=h.find(key:INST_No);
Or simplify the code by combining this with the subsequent IF-THEN statement:
if h.find(key:INST_No)=0 then output;
Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).
Your entire step can be condensed to this:
data match ;
if _n_ = 1 then do ;
dcl hash h (dataset:"inst_dsc") ;
h.definekey ("inst_at") ;
h.definedone () ;
end ;
set final1 (keep = inst_no inst_at) ;
if h.check() = 0 ;
run ;
Note:
Kind regards
Paul D.
One thing I might change:
if h.check() = 0 ;
to
if not h.check();
Yes, it can be coded this way. However, unfortunately, the way SAS designed the return codes is counter-intuitive from the standpoint of the normal SAS Boolean conventions. Hence, for a SAS programmer not quite familiar with the way the hash methods' return codes are set up, IF NOT H.FIND() is semantically perceived as "if key not found", while it's just the opposite.
This incongruence has caused quite a bit of costernation to my coauthor @DonH when he was loading himself with the SAS hash knowledge for the same reason as to me: Naturally, he and I would much rather prefer to have RC=1 if the call is successful and RC=0 otherwise. If that were the case, IF H.FIND() would mean "found" and IF NOT H.FIND() would mean "not found", and no confusion would arise.
More unfortunately, some hash object methods offer argument tags assigned a variable name whose value is set to 1 in case of success and to 0 in case of failure, just the way the things ought to be. For all these reasons, I prefer to code return codes explicitly rather than to turn them into Booleans, just to avoid disorientation both for myself and those reading my hash code.
Kind regards
Paul D.
@hashman: Very clear explanations, as always. I may be wrong, but my understanding was that variable inst_at is in dataset inst_dsc, but not in dataset Final1. In this case I would insert
inst_at=inst_no;
before the subsetting IF statement to support the simplifications you made to the code (and, of course, remove inst_at from the KEEP list).
You may very well be correct on that head.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.