BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
somebody
Lapis Lazuli | Level 10

I have 2 datasets A and B containing company names. B contains correct names whereas A contains slightly wrong names. How can I ask SAS to find match obs in A with obs in B that are similar? an Exambple would be:

- match "AGL ENRGY LTD" in A to  "AGL ENERGY LTD" in B; or

- match "AMER CAP" in A to "AMERICAN CAPITAL" in B; or

- match "EMPIRE CO' in A to "EMPIRE COMPANY" in B

 

I have been manually finding abbreviations and change them to full such as CO to COMPANY, or CORP to CORPORATION but there are still obs with missing letters in name. One way I can think of is to match all obs in B to each obs in A, and then use COMPGED or COMPLEV to get a similarity score and use the one with highest score. However, this would create a very large dataset. And how do I match all obs in B to each obs in A?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Functions such as COMPGED perfectly answer your needs, but they are expensive.

You are right to clean your data before using them: LTD/LIMITED, etc.

This is an iterative process.

First try to also match on something else. For example 

  where first(NAME1)=first(NAME2) and compged(NAME1,NAME2) < some small value 

As you match more and more, you can loosen the criteria on the reduced volume of unmatched names.

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Functions such as COMPGED perfectly answer your needs, but they are expensive.

You are right to clean your data before using them: LTD/LIMITED, etc.

This is an iterative process.

First try to also match on something else. For example 

  where first(NAME1)=first(NAME2) and compged(NAME1,NAME2) < some small value 

As you match more and more, you can loosen the criteria on the reduced volume of unmatched names.

somebody
Lapis Lazuli | Level 10

I have been matching using the first 3 words in the names, and then 2 and then 1. But if there are some errors in the first word then they don't match.

Do you know how to match all obs in B to each obs in A?

 

smantha
Lapis Lazuli | Level 10
You can use compged like chris suggested or there are whole other suite like spedis etc. you can create a separate scoring set to do the mapping.
ChrisNZ
Tourmaline | Level 20
I am unsure what's unclear in my reply. Sorry.

> Do you know how to match all obs in B to each obs in A?
Unsure what this means either.
somebody
Lapis Lazuli | Level 10

I would like to create a new dataset that has all observations in B for every observation in A. For exambple, if A has 5 observation and B has 10 obs, then the new merged dataset would have 50 observations. How can I perform this merge?

 

ChrisNZ
Tourmaline | Level 20

>  if A has 5 observation and B has 10 obs, then the new merged dataset would have 50 observations

Use:   from TABLE1, TABLE2   without a   where  clause to create such a join.

That's a called a Cartesian join. Why would you do that?

Your current method is correct:

1. Standardise the data

2. Join on increasingly looser criteria. Only try to match the unmatched data.
  - Straight equality

  - Almost equal (this can be many steps)

  - Not quite the same (this can be many steps)

  - Quite different (this can be many steps)

  Keep track of what criterion was used when you achieve a match.

 

Patrick
Opal | Level 21

On top of what others suggested: Do you have the SAS Data Quality Server licensed? If so then this would allow you to standardize company names and then join over the standardized names.

somebody
Lapis Lazuli | Level 10

Do you know how to check is my SAS has the licence?

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 727 views
  • 1 like
  • 4 in conversation