DATA Step, Macro, Functions and more

Merging on a Partial/Prefix Key

Reply
Occasional Contributor
Posts: 16

Merging on a Partial/Prefix Key

I don't know why I have not encountered this before, but...

I need to merge two SAS data sets, both with the same key variable, but in the lookup data set the key may only contain the first few characters of the key in the master data set. For some rows, the key values in both data sets will exactly match. In other rows, the lookup data set may only be partial, but still exact for as many characters as are in the key value. When the lookup data set has only part of the full key value (essentially, a prefix), it will still be unique: that is, no two rows in the lookup data set will have the same key value. So, the merge would need to be similar to performing a WHERE statement using the LIKE operator.

 

I was thinking of pursuing fuzzy matching, but that process seems too impercise for my needs. I don't risk incorrect matching as long as I can match based on the first few characters of the key in the lookup data set.

 

Any ideas on the right direction to take?

PROC Star
Posts: 7,363

Re: Merging on a Partial/Prefix Key

Can you provide examples of the two data steps?

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 16

Re: Merging on a Partial/Prefix Key

The two data sets might look something like this:

 

Master data set:

 

Contract            Dollars

ABC12345        1000

BCD98765        5000

XYZ99999        100

 

Lookup data set:

 

Contract           Customer

ABC12345       Army

BCD                 Navy

XYZ99999       Air Force

 

An exact match using contract as the BY variable will merge contracts ABC12345 and XYZ99999, but not contract BCD. but I want to match it with customer "Navy".

PROC Star
Posts: 7,363

Re: Merging on a Partial/Prefix Key

Here is one way:

data master;
  input Contract $ Dollars;
  cards;
ABC12345        1000
BCD98765        5000
XYZ99999        100
;

data lookup;
  input Contract$ Customer $;
  cards;
ABC12345       Army
BCD                 Navy
XYZ99999       Air Force
;

proc sql noprint;
  create table want as
    SELECT a.*, b.Customer
      FROM master a
       left JOIN lookup b
         ON a.Contract LIKE catt('%', b.Contract, '%') 
  ;
quit;

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 16

Re: Merging on a Partial/Prefix Key

Thanks, I had a hunch the answer would invlove sharpening my SQL skills. I will give this a whirl later today.

Super User
Posts: 10,497

Re: Merging on a Partial/Prefix Key

If you search the Base SAS Programming board for "fuzzy match" you'll find a number of related posts and solutions.

Super User
Posts: 9,676

Re: Merging on a Partial/Prefix Key

Or Perl Regurlar Expression could you more power.

 

data master;
  input Contract $ Dollars;
  cards;
abc        1000
BCD98765        5000
XYZ99999        100
;

data lookup;
  input Contract$ Customer $;
  cards;
ABC12345       Army
BCD                 Navy
XYZ99999       Air Force
;

proc sql ;
    SELECT a.*, b.Customer,b.Contract as b
      FROM master a
       inner JOIN lookup b
         ON prxmatch(cats('/^',a.Contract,'/i'),b.Contract) or 
            prxmatch(cats('/^',b.Contract,'/i'),a.Contract)
  ;
quit;
Occasional Contributor
Posts: 16

Re: Merging on a Partial/Prefix Key

I used this method on my 12.4 million row master table. After 24 hours I stopped it. It processed about 1/4 of the master table, but does have a match for each row. I'm going to visually inspect the matching to see if it appears to be accurate. Then, I may subset the master table and run the process in pieces, unless anyone has an idea how to make this run more efficiently. I could try indexing both the master table and the lookup table, but I don't know if indexes matter to this matching process. Thanks.

Super User
Posts: 9,676

Re: Merging on a Partial/Prefix Key

If you have a big table, then it is not suited for SQL.

Hash Table + Data Step might be more efficient.

What size of these two tables ?

Occasional Contributor
Posts: 16

Re: Merging on a Partial/Prefix Key

The master table is 12.4 million rows. The lookup table is about 1,000 rows.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Super User
Posts: 9,676

Re: Merging on a Partial/Prefix Key

Yeah. Try @art297 's SQL + LIKE firstly .

PROC Star
Posts: 7,363

Re: Merging on a Partial/Prefix Key

@CurtisSmithDCAA: Did you try the LIKE operator or the regular expression method? I would think that 12.4 Million records should not have posed any problem, in itself, for PROC SQL.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 9,676

Re: Merging on a Partial/Prefix Key

Also need consider this scenario .

 

data master;
  input Contract $ Dollars;
  cards;
BCD        5000
;

data lookup;
  input Contract$ Customer $;
  cards;
XXBCDXX                Navy
;

 

Occasional Contributor
Posts: 16

Re: Merging on a Partial/Prefix Key

The SQL with the LIKE operator did run after several hours but the resulting table had more rows than the master table. I have not had a chance to diagnose why. The method with the Perl Regurlar Expression seems to work without creating addiditonal rows, but at the rate it was running would take about 4 days to complete.

PROC Star
Posts: 7,363

Re: Merging on a Partial/Prefix Key

@CurtisSmithDCAA: I think you just have to add DISTINCT to your select statement. Otherwise, you'll get multiple rows if there are both partial and full matches. e.g.:

data master;
  input Contract $ Dollars;
  cards;
ABC12345        1000
BCD98765        5000
XYZ99999        100
;

data lookup;
  input Contract$ Customer $;
  cards;
ABC12345       Army
BCD            Navy
BCD98765       Navy
XYZ99999       Air Force
;

proc sql noprint;
  create table want as
    SELECT distinct a.*, b.Customer
      FROM master a
       left JOIN lookup b
         ON a.Contract LIKE catt('%', b.Contract, '%') 
  ;
quit;

Art, CEO, AnalystFinder.com

Ask a Question
Discussion stats
  • 23 replies
  • 189 views
  • 0 likes
  • 4 in conversation