- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset labelled as Final1 of patients with 84 records which has INST_NO i.e. the institution where they received some care
I want to compare the above dataset with another dataset (Inst_dsc) in which I have a list of Institution Nos labelled as INST_AT
How would i write a code to check if the INST_No in the 1st dataset is present in the 2nd dataset
Basically want to flag the cases or records
data match (Keep=INST_No INST_AT);
1227 length inst_AT 8 ;
1228 if _N_ then do;
1229 declare hash h (dataset:"work.inst_dsc");
1230 h.definekey ('INST_AT');
1231 call missing (INST_AT);
1232 end;
1233 set Final1;
1234 rc=h.find(INST_AT);
1235 if rc=0 then output;
1236 run;
ERROR: Uninitialized keys for hash object at line 1234 column 4.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set Final1.
WARNING: The data set WORK.MATCH may be incomplete. When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.MATCH was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1237 proc print ;
1238 run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read
rc=h.find(key:INST_No);
Or simplify the code by combining this with the subsequent IF-THEN statement:
if h.find(key:INST_No)=0 then output;
Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if _N_ then do;
means that you redefine the hash in every data step iteration (all numeric values except missing and zero are true).
Change to
if _N_ = 1 then do;
You are also missing a call to the definedata() (if you have data columns) and definedone() methods.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thankyou Appreciate it
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would like to point out that the primary mistake was the omission of definedone();. This caused the error messages in the log. Had it been inserted, the first error message would have changed to
ERROR: Unknown argument tag at line ... column 4. Tag must be key.
which might have helped you to find the solution. So, the most important response was actually @Kurt_Bremser's.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Moreover, in the FIND method you forgot the "key:" argument tag and used the wrong variable name. It should read
rc=h.find(key:INST_No);
Or simplify the code by combining this with the subsequent IF-THEN statement:
if h.find(key:INST_No)=0 then output;
Once you've corrected this and the issues KurtBremser mentioned (definedata(...) is not required, though, only optional) your program should work. Of course, it doesn't really flag the matching cases, but selects them from dataset Final1. This could be changed easily by replacing the OUTPUT statement with an assignment statement such as flag=1 (and adding flag to the KEEP list).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your entire step can be condensed to this:
data match ;
if _n_ = 1 then do ;
dcl hash h (dataset:"inst_dsc") ;
h.definekey ("inst_at") ;
h.definedone () ;
end ;
set final1 (keep = inst_no inst_at) ;
if h.check() = 0 ;
run ;
Note:
- You need neither LENGTH nor CALL MISSING, as the hash key is on file FINAL1, and parameter type matching (i.e. setting the hash host PDV variables) occurs when the compiler sees the SET reference to FINAL1. Most of the time, LENGTH and CALL MISSING aren't needed - they are just leftovers from the maladroitness in the documentation.
- As @FreelanceReinh and @Kurt_Bremser have pointed out, _N_=1 is the condition you need to avoid (a) creating a new hash object instant at every observation read from FINAL1 and (b) loading the hash table at every observation.
- Keep the only variables you need on input, not on output, to prevent polluting the PDV with extraneous stuff and degrading performance.
- There's no need to use an assigned (i.e. RC=) method call because a call embedded in the IF statement prevents generating KEY NOT FOUND errors if there's no match.
- Likewise, there's no need for the OUTPUT statement since you can just use the subsetting IF.
- Since you don't need to retrieve any satellite information from the data portion of the hash table, you needn't use the FIND method. You're only checking if there's a match; so the CHECK method will suffice, not to mention that it's faster than FIND for the obvious reason that it doesn't entail the extra burden of moving data from the hash table to the PDV host variables.
Kind regards
Paul D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One thing I might change:
if h.check() = 0 ;
to
if not h.check();
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it can be coded this way. However, unfortunately, the way SAS designed the return codes is counter-intuitive from the standpoint of the normal SAS Boolean conventions. Hence, for a SAS programmer not quite familiar with the way the hash methods' return codes are set up, IF NOT H.FIND() is semantically perceived as "if key not found", while it's just the opposite.
This incongruence has caused quite a bit of costernation to my coauthor @DonH when he was loading himself with the SAS hash knowledge for the same reason as to me: Naturally, he and I would much rather prefer to have RC=1 if the call is successful and RC=0 otherwise. If that were the case, IF H.FIND() would mean "found" and IF NOT H.FIND() would mean "not found", and no confusion would arise.
More unfortunately, some hash object methods offer argument tags assigned a variable name whose value is set to 1 in case of success and to 0 in case of failure, just the way the things ought to be. For all these reasons, I prefer to code return codes explicitly rather than to turn them into Booleans, just to avoid disorientation both for myself and those reading my hash code.
Kind regards
Paul D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@hashman: Very clear explanations, as always. I may be wrong, but my understanding was that variable inst_at is in dataset inst_dsc, but not in dataset Final1. In this case I would insert
inst_at=inst_no;
before the subsetting IF statement to support the simplifications you made to the code (and, of course, remove inst_at from the KEEP list).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may very well be correct on that head.