Re: SAS Hashing but make each hash joined in unique?

lawatkey · Posted 10-06-2021 08:32 PM

Hi, so I'm running a SAS hash step to join together two sets of data on 3 criteria, one of the criteria having a range (it's a date field where the dates can be close and would still be considered a match if the other 2 static fields were a 100% match).

The program works just fine, but my issue comes from a misunderstanding: I thought that hashing doesn't reevaluate a hashed/matched field after the match is found. Example: If row 1 of my WORK.BASE matches to row 50 of WORK.HAVE, I expected row 50 to not be evaluated again, but it turns out I am getting a fair amount of duplicate matches. I can have something like rows 31-50 of my WORK.BASE matching to the same row 100 in my WORK.HAVE. Is there a way around this? Is there a way to just keep the best case match and no longer use that match in my hash? Would this require a lengthy row by row macro?

DATA WORK.WANT;

SET WORK.BASE;

KEY=.;

IF _N_ = 1

THEN DO;

DECLARE HASH B (DATASET:"WORK.HAVE (RENAME=(DYNAMIC_FIELD = B_DYNAMIC_FIELD))";

B.DEFINEKEY("STATIC_FIELD_1","STATIC_FIELD_2","B_DYNAMIC_FIELD");

B.DEFINEDATA("KEY");

B.DEFINEDONE();

END;

B_DYNAMIC_FIELD = DYNAMIC_FIELD;

RC = B.FIND();

DO WHILE (RC NE -5 AND B_DYNAMIC_FIELD < DYNAMIC_FIELD + 5);

B_DYNAMIC_FIELD + 1;

RC = B.FIND();

END;

DROP RC B_DYNAMIC_FIELD;

RUN;

ChrisNZ · Posted 10-06-2021 08:54 PM

Unsure what the issue is but:

1. You don't use the multidata option, so each key is unique. So you do not get duplicate matches.

Each match is the one for which you supplied the key values.

2. Is there a way to [..] no longer use that match in my hash?
You can use the delete method. After you match, add B.DELETE() to ensure this entry will never ne matched again.

High-Performance SAS Coding - Third Edition

lawatkey · Posted 10-06-2021 09:27 PM

1. I'm not sure what you mean with this option, I'm not a hashing expert, I just adapted code I found online to work for my needs.

In my example, let's say my base table has 100 rows, my "B" table (that I'm using as my hash) has 50 rows. I am finding that multiple rows from my base table match to the same row in my B/Hash table, with no multioption.

2. Would I just add a row with B.DELETE(); after the B.DEFINEDONE(); row? But this sounds like exactly what I'm looking for 🙂

ChrisNZ · Posted 10-06-2021 11:50 PM

You'd use it after FIND:

RC=B.FIND():

if RC=0 then RC=B.DELETE();

High-Performance SAS Coding - Third Edition

lawatkey · Posted 10-07-2021 09:44 AM

I tried adding the "if RC=0 then RC=B.DELETE();" after the first RC=B.FIND(); the second, and both, and I always get a DATA STEP component object failure, am I missing something in the syntax?

DATA WORK.WANT;

SET WORK.BASE;

KEY=.;

IF _N_ = 1

THEN DO;

DECLARE HASH B (DATASET:"WORK.HAVE (RENAME=(DYNAMIC_FIELD = B_DYNAMIC_FIELD))";

B.DEFINEKEY("STATIC_FIELD_1","STATIC_FIELD_2","B_DYNAMIC_FIELD");

B.DEFINEDATA("KEY");

B.DEFINEDONE();

END;

B_DYNAMIC_FIELD = DYNAMIC_FIELD;

RC = B.FIND();

if RC=0 then RC=B.DELETE();

DO WHILE (RC NE -5 AND B_DYNAMIC_FIELD < DYNAMIC_FIELD + 5);

B_DYNAMIC_FIELD + 1;

RC = B.FIND();

if RC=0 then RC=B.DELETE();

END;

DROP RC B_DYNAMIC_FIELD;

RUN;

ChrisNZ · Posted 10-07-2021 06:36 PM

Agree with @Patrick , some data to help you would be better so we can provide tested code.

About the message, I am not too sure. What version of SAS do you run? Run %put &=sysvlong;

High-Performance SAS Coding - Third Edition

lawatkey · Posted 10-07-2021 09:32 PM

if rc = 0
then do;
  rc = b.remove();
end;

This is what I ended up using and it worked, my rc ne -5 part was also not doing what I was expecting it to do, so thank you! This solved my duplication issue 🙂

Patrick · Posted 10-08-2021 12:31 AM

@lawatkey I never understood in your description why the same item in the hash (the same date) can't be the closest match to two rows in your base table. Sure, if you remove the item after selecting it then you won't get it twice. May be make at least sure that your base table is pre-sorted by this "dynamic" variable as else the result will likely be rather "random".

Question:

If you run your code with a base table once sorted ascending by date and once sorted descending by date: Do you get the exactly same result?

Patrick · Posted 10-07-2021 03:18 AM

I believe it would be best you provide sample data (via working data steps) and show us the expected result. That would remove a lot of ambiguity in your question and also allow us to provide tested code as answer.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away