BookmarkSubscribeRSS Feed
cgates
Obsidian | Level 7

I'm testing out how to use hash objects in SAS 9.4 M6 to do fuzzy joins since PROC SQL just runs for hours on my larger dataset. I created some sample datasets (below) and what I want is for the merge to pull in exact matches on the "name" fields AND any matches that have a COMPLEV score less than 10. Right now, this code still only pulls in the exact matches. I'm very new to hash objects so I'm sure it's a simple fix but I've tried am in need of help.

 

data A;
infile datalines missover;
length nameA $50;
input nameA $ ;
datalines;
MICKEYMOUSE2000-01-02
DAFDUCK1990-09-23
GOOFYMAN1993-05-11
;
run;

*second dataset with one exact match and two that differ slightly from those in dataset A;
data B;
infile datalines missover;
length nameB $50;
input nameB $ VDAY :ddmmyy10.;
format VDAY ddmmyy10.;
datalines;
MICKEYMOUSE2000-01-01 07/08/2021
DAFFYDUCK1990-09-23 05/11/2021
GOOFYMAN1993-05-11 08/11/2021
;
run;

*only pulling in exact matches, want it to pull in other fuzzy matches;

data simplemerge ;

if 0 then set work.B ; *load var properties into hash table;

if _n_ = 1 then do;
    dcl hash B (dataset: 'work.B'); *declare the name B for hash using B dataset;
    B.definekey('nameB');*identify var in B to use as key;
    B.definedata('nameB','vday');*identify columns of data to bring in from B dataset;
    B.definedone();*complete hash table definition;
end;

set work.A; *bring in A data;

if B.Find(KEY: nameA) ne 0 then do;
        if complev(nameA, nameB) < 10 then do;
            B.ref(key : nameB,data : nameB, data : vday);
        end;
end;

RUN;
1 REPLY 1
ChrisNZ
Tourmaline | Level 20

Hash tables only perform equijoins.  

 

To speed up your fuzzy SQL match, split it. For example:

1- Match on same value

2- Take the unmatched records and match on same uppercase values

3- Take the unmatched records and match on same first letter and same compbl() value

4- Take the unmatched records and match on same first letter and same length and complev()

5- Take the unmatched records and match on same first letter and complev()

6- Take the unmatched records and match on complev()

 

Adapt the steps to your data. The goal is to increase match complexity as you lower volume.

The code above is just an example: I'd hope that you have standardised things such case, punctuation or spacing. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 862 views
  • 1 like
  • 2 in conversation