BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Hi,

I tested this code

data h ;
input key hdata ;
cards ;
1 5
1 10
1 15
2 9
2 11
2 16
3 20
3 100
4 6
5 5
5 99
;
run ;

data _null_ ;
if 0 then set H ;
dcl hash H (dataset: "H", multidata: "Y", ordered: "A") ;
h.definekey ("key") ;
h.definedata ("key", "hdata") ;
h.definedone () ;

do key = 1, 2, 5 ;
put '01 - key est ' key;
do find_rc = h.find() by 0 while (find_rc = 0) ;
put ' au do find, find_rc est ' find_rc;
put ' 02 - hdata est ' hdata;
if hdata > 9 then remove_rc = h.removedup() ;
put ' 03 - remove_rc est ' remove_rc;
find_rc = h.find_next() ;
put ' 04 - find_rc est ' find_rc;
end ;
end ;

dcl hiter ih ("H") ;

do while (ih.next() = 0) ;
put key= hdata= ;
end ;
run ;

AND GOT THIS RESULT:
NOTE: There were 11 observations read from the data set WORK.H.
01 - key est 1
au do find, find_rc est 0
02 - hdata est 5
03 - remove_rc est .
04 - find_rc est 0
au do find, find_rc est 0
02 - hdata est 10
03 - remove_rc est 0
04 - find_rc est 160038
01 - key est 2
au do find, find_rc est 0
02 - hdata est 9
03 - remove_rc est 0
04 - find_rc est 0
au do find, find_rc est 0
02 - hdata est 11
03 - remove_rc est 0
04 - find_rc est 160038
01 - key est 5
au do find, find_rc est 0
02 - hdata est 5
03 - remove_rc est 0
04 - find_rc est 0
au do find, find_rc est 0
02 - hdata est 99
03 - remove_rc est 0
04 - find_rc est 160038
key=1 hdata=5
key=1 hdata=15
key=2 hdata=9
key=2 hdata=16
key=3 hdata=20
key=3 hdata=100
key=4 hdata=6
key=5 hdata=5
NOTE: DATA STEP stopped due to looping.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.02 seconds
cpu time 0.02 seconds
 
In the second 04 printing, I was expecting a 0 , cause there is in fact a next.
What is that 160038 ????  It makes absolutely NO SENSE !!!
And it goes on for others as well. . .
 
You can email me at michel.jubinville@hotmail.ca
Thanks


 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Mij,

 

Yes, some of the hash object methods (including REMOVEDUP, not to mention SUMDUP ...) have their intricacies. Not all of those details are easily available in the documentation or in books about the subject. At some point you may want to perform your own experiments.

 

The particular issue in your code, however, is well described and solved in sections 4.3.11 and 4.3.12 of the book [1] Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study and also in chapter 5 (pp. 129, 136 f.) of the book [2] SAS® Hash Object Programming Made Easy.

 

Once REMOVEDUP has removed an item from a same-key item group, SAS (unfortunately) has forgotten where it was in that group -- it "unsets the item list" ([1], p. 90), in other words: "sets the pointers to null" ([2], p. 129) --, hence the nonzero return code 160038 you received from the FIND_NEXT method. To remove more (but not all) items of the group, you need to set the pointer to the first item of that group again (using FIND or DO_OVER or CHECK or REF), move the pointer to the next item to be deleted (using FIND_NEXT/FIND_PREV or DO_OVER), if necessary, and repeat these steps until all deletions are done. The HAS_NEXT method can be used in this process to find out if there are more items in the group which may or may not qualify for deletion (see program 4.12 in [1], p. 91).

 

EDIT: I've just discovered that in my SAS 9.4M5 DO_OVER seems to play well with REMOVEDUP, so you would no longer need an outer loop to restart the enumeration of a group with two or more items to be removed, nor the HAS_NEXT method to support this iteration (as suggested on p. 92 of [1]). Instead, you can simply write something like

 do key = 1, 2, 5;
   do while(h.do_over()=0);
     if hdata > 9 then h.removedup();
   end;
 end;

in order to selectively remove items from same-key item groups. Not sure in which maintenance release this has been enabled. I haven't found a hint about this change in the documentation.

 

So, to reiterate, you can gain new insights by performing your own experiments.

View solution in original post

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hi @Mij,

 

Yes, some of the hash object methods (including REMOVEDUP, not to mention SUMDUP ...) have their intricacies. Not all of those details are easily available in the documentation or in books about the subject. At some point you may want to perform your own experiments.

 

The particular issue in your code, however, is well described and solved in sections 4.3.11 and 4.3.12 of the book [1] Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study and also in chapter 5 (pp. 129, 136 f.) of the book [2] SAS® Hash Object Programming Made Easy.

 

Once REMOVEDUP has removed an item from a same-key item group, SAS (unfortunately) has forgotten where it was in that group -- it "unsets the item list" ([1], p. 90), in other words: "sets the pointers to null" ([2], p. 129) --, hence the nonzero return code 160038 you received from the FIND_NEXT method. To remove more (but not all) items of the group, you need to set the pointer to the first item of that group again (using FIND or DO_OVER or CHECK or REF), move the pointer to the next item to be deleted (using FIND_NEXT/FIND_PREV or DO_OVER), if necessary, and repeat these steps until all deletions are done. The HAS_NEXT method can be used in this process to find out if there are more items in the group which may or may not qualify for deletion (see program 4.12 in [1], p. 91).

 

EDIT: I've just discovered that in my SAS 9.4M5 DO_OVER seems to play well with REMOVEDUP, so you would no longer need an outer loop to restart the enumeration of a group with two or more items to be removed, nor the HAS_NEXT method to support this iteration (as suggested on p. 92 of [1]). Instead, you can simply write something like

 do key = 1, 2, 5;
   do while(h.do_over()=0);
     if hdata > 9 then h.removedup();
   end;
 end;

in order to selectively remove items from same-key item groups. Not sure in which maintenance release this has been enabled. I haven't found a hint about this change in the documentation.

 

So, to reiterate, you can gain new insights by performing your own experiments.

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Thanks. In the paper I'm reading, do_over is not mentionned. I guess it came with SAS 9.4.

BTW, I bought the 2 books today.

PeterClemmensen
Tourmaline | Level 20

@FreelanceReinh very nice explanation. Quite interesting that the Removedup Method does not unset the item list when Do_Over is used instead of the Find()/Has_Next()/Find_Next() approach. This seems to be te case at least from 9.4M3. Can't seem to find any doc mention it. That simplifies the selective deletion operation quite a bit..

 

@Mij kudos for the curiosity. I'm sure you will find many great hash tips in the two books 🙂

 

 

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6
The books will be mine next Friday. Pretty sure I'll find lots of interesting stuff.
Thanks and have a nice day.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3471 views
  • 4 likes
  • 3 in conversation