Hi,
I tested this code
data h ;
input key hdata ;
cards ;
1 5
1 10
1 15
2 9
2 11
2 16
3 20
3 100
4 6
5 5
5 99
;
run ;
data _null_ ;
if 0 then set H ;
dcl hash H (dataset: "H", multidata: "Y", ordered: "A") ;
h.definekey ("key") ;
h.definedata ("key", "hdata") ;
h.definedone () ;
do key = 1, 2, 5 ;
put '01 - key est ' key;
do find_rc = h.find() by 0 while (find_rc = 0) ;
put ' au do find, find_rc est ' find_rc;
put ' 02 - hdata est ' hdata;
if hdata > 9 then remove_rc = h.removedup() ;
put ' 03 - remove_rc est ' remove_rc;
find_rc = h.find_next() ;
put ' 04 - find_rc est ' find_rc;
end ;
end ;
dcl hiter ih ("H") ;
do while (ih.next() = 0) ;
put key= hdata= ;
end ;
run ;
AND GOT THIS RESULT:
Hi @Mij,
Yes, some of the hash object methods (including REMOVEDUP, not to mention SUMDUP ...) have their intricacies. Not all of those details are easily available in the documentation or in books about the subject. At some point you may want to perform your own experiments.
The particular issue in your code, however, is well described and solved in sections 4.3.11 and 4.3.12 of the book [1] Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study and also in chapter 5 (pp. 129, 136 f.) of the book [2] SAS® Hash Object Programming Made Easy.
Once REMOVEDUP has removed an item from a same-key item group, SAS (unfortunately) has forgotten where it was in that group -- it "unsets the item list" ([1], p. 90), in other words: "sets the pointers to null" ([2], p. 129) --, hence the nonzero return code 160038 you received from the FIND_NEXT method. To remove more (but not all) items of the group, you need to set the pointer to the first item of that group again (using FIND or DO_OVER or CHECK or REF), move the pointer to the next item to be deleted (using FIND_NEXT/FIND_PREV or DO_OVER), if necessary, and repeat these steps until all deletions are done. The HAS_NEXT method can be used in this process to find out if there are more items in the group which may or may not qualify for deletion (see program 4.12 in [1], p. 91).
EDIT: I've just discovered that in my SAS 9.4M5 DO_OVER seems to play well with REMOVEDUP, so you would no longer need an outer loop to restart the enumeration of a group with two or more items to be removed, nor the HAS_NEXT method to support this iteration (as suggested on p. 92 of [1]). Instead, you can simply write something like
do key = 1, 2, 5;
do while(h.do_over()=0);
if hdata > 9 then h.removedup();
end;
end;
in order to selectively remove items from same-key item groups. Not sure in which maintenance release this has been enabled. I haven't found a hint about this change in the documentation.
So, to reiterate, you can gain new insights by performing your own experiments.
Hi @Mij,
Yes, some of the hash object methods (including REMOVEDUP, not to mention SUMDUP ...) have their intricacies. Not all of those details are easily available in the documentation or in books about the subject. At some point you may want to perform your own experiments.
The particular issue in your code, however, is well described and solved in sections 4.3.11 and 4.3.12 of the book [1] Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study and also in chapter 5 (pp. 129, 136 f.) of the book [2] SAS® Hash Object Programming Made Easy.
Once REMOVEDUP has removed an item from a same-key item group, SAS (unfortunately) has forgotten where it was in that group -- it "unsets the item list" ([1], p. 90), in other words: "sets the pointers to null" ([2], p. 129) --, hence the nonzero return code 160038 you received from the FIND_NEXT method. To remove more (but not all) items of the group, you need to set the pointer to the first item of that group again (using FIND or DO_OVER or CHECK or REF), move the pointer to the next item to be deleted (using FIND_NEXT/FIND_PREV or DO_OVER), if necessary, and repeat these steps until all deletions are done. The HAS_NEXT method can be used in this process to find out if there are more items in the group which may or may not qualify for deletion (see program 4.12 in [1], p. 91).
EDIT: I've just discovered that in my SAS 9.4M5 DO_OVER seems to play well with REMOVEDUP, so you would no longer need an outer loop to restart the enumeration of a group with two or more items to be removed, nor the HAS_NEXT method to support this iteration (as suggested on p. 92 of [1]). Instead, you can simply write something like
do key = 1, 2, 5;
do while(h.do_over()=0);
if hdata > 9 then h.removedup();
end;
end;
in order to selectively remove items from same-key item groups. Not sure in which maintenance release this has been enabled. I haven't found a hint about this change in the documentation.
So, to reiterate, you can gain new insights by performing your own experiments.
Thanks. In the paper I'm reading, do_over is not mentionned. I guess it came with SAS 9.4.
BTW, I bought the 2 books today.
@FreelanceReinh very nice explanation. Quite interesting that the Removedup Method does not unset the item list when Do_Over is used instead of the Find()/Has_Next()/Find_Next() approach. This seems to be te case at least from 9.4M3. Can't seem to find any doc mention it. That simplifies the selective deletion operation quite a bit..
@Mij kudos for the curiosity. I'm sure you will find many great hash tips in the two books 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.