BookmarkSubscribeRSS Feed
lawatkey
Obsidian | Level 7

Hi, so I'm running a SAS hash step to join together two sets of data on 3 criteria, one of the criteria having a range (it's a date field where the dates can be close and would still be considered a match if the other 2 static fields were a 100% match).  

The program works just fine, but my issue comes from a misunderstanding:  I thought that hashing doesn't reevaluate a hashed/matched field after the match is found.  Example:  If row 1 of my WORK.BASE matches to row 50 of WORK.HAVE, I expected row 50 to not be evaluated again, but it turns out I am getting a fair amount of duplicate matches.  I can have something like rows 31-50 of my WORK.BASE matching to the same row 100 in my WORK.HAVE.  Is there a way around this?  Is there a way to just keep the best case match and no longer use that match in my hash?  Would this require a lengthy row by row macro?

 

DATA WORK.WANT;

SET WORK.BASE;

KEY=.;

IF _N_ = 1

THEN DO;

   DECLARE HASH B (DATASET:"WORK.HAVE (RENAME=(DYNAMIC_FIELD = B_DYNAMIC_FIELD))";

   B.DEFINEKEY("STATIC_FIELD_1","STATIC_FIELD_2","B_DYNAMIC_FIELD");

   B.DEFINEDATA("KEY");

   B.DEFINEDONE();

END;

B_DYNAMIC_FIELD = DYNAMIC_FIELD;

RC = B.FIND();

DO WHILE (RC NE -5 AND B_DYNAMIC_FIELD < DYNAMIC_FIELD + 5);

   B_DYNAMIC_FIELD + 1;

   RC = B.FIND();

END;

DROP RC B_DYNAMIC_FIELD;

RUN;

 

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Unsure what the issue is but:

1. You don't use the multidata option, so each key is unique. So you do not get duplicate matches.

Each match is the one for which you supplied the key values.

2.  Is there a way to [..] no longer use that match in my hash? 
You can use the delete method. After you match, add B.DELETE() to ensure this entry will never ne matched again.

lawatkey
Obsidian | Level 7

1. I'm not sure what you mean with this option, I'm not a hashing expert, I just adapted code I found online to work for my needs.

In my example, let's say my base table has 100 rows, my "B" table (that I'm using as my hash) has 50 rows.  I am finding that multiple rows from my base table match to the same row in my B/Hash table, with no multioption.

2. Would I just add a row with B.DELETE(); after the B.DEFINEDONE(); row?  But this sounds like exactly what I'm looking for 🙂

ChrisNZ
Tourmaline | Level 20

You'd use it after FIND:

 

RC=B.FIND():

if RC=0 then RC=B.DELETE();

lawatkey
Obsidian | Level 7

I tried adding the "if RC=0 then RC=B.DELETE();" after the first RC=B.FIND(); the second, and both, and I always get a DATA STEP component object failure, am I missing something in the syntax?

 

DATA WORK.WANT;

SET WORK.BASE;

KEY=.;

IF _N_ = 1

THEN DO;

   DECLARE HASH B (DATASET:"WORK.HAVE (RENAME=(DYNAMIC_FIELD = B_DYNAMIC_FIELD))";

   B.DEFINEKEY("STATIC_FIELD_1","STATIC_FIELD_2","B_DYNAMIC_FIELD");

   B.DEFINEDATA("KEY");

   B.DEFINEDONE();

END;

B_DYNAMIC_FIELD = DYNAMIC_FIELD;

RC = B.FIND();

if RC=0 then RC=B.DELETE();

DO WHILE (RC NE -5 AND B_DYNAMIC_FIELD < DYNAMIC_FIELD + 5);

   B_DYNAMIC_FIELD + 1;

   RC = B.FIND();

if RC=0 then RC=B.DELETE();

END;

DROP RC B_DYNAMIC_FIELD;

RUN;

ChrisNZ
Tourmaline | Level 20

Agree with @Patrick , some data to help you would be better so we can provide tested code.

About the message, I am not too sure. What version of SAS do you run?  Run %put &=sysvlong;

lawatkey
Obsidian | Level 7
if rc = 0
then do;
  rc = b.remove();
end;

This is what I ended up using and it worked, my rc ne -5 part was also not doing what I was expecting it to do, so thank you! This solved my duplication issue 🙂  

Patrick
Opal | Level 21

@lawatkey I never understood in your description why the same item in the hash (the same date) can't be the closest match to two rows in your base table. Sure, if you remove the item after selecting it then you won't get it twice. May be make at least sure that your base table is pre-sorted by this "dynamic" variable as else the result will likely be rather "random".

 

Question:

If you run your code with a base table once sorted ascending by date and once sorted descending by date: Do you get the exactly same result?

Patrick
Opal | Level 21

I believe it would be best you provide sample data (via working data steps) and show us the expected result. That would remove a lot of ambiguity in your question and also allow us to provide tested code as answer.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 874 views
  • 3 likes
  • 3 in conversation