BookmarkSubscribeRSS Feed
syazwaan
Calcite | Level 5

I have 2 datasets (Data and Locality). I want to match string between column 'ADDR1' in first dataset (Data) with 'Locality Name' in second dataset (Locality). The 'Data' dataset contain 800,000+ rows and the 'Locality' dataset contain 2,000+ rows.

 

Example:

ADDR1Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGKAMPUNG TERSUSUN SERKAM DARAT
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATTAMAN REMBIA SETIA
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIAJALAN BUKIT BERUANG
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIAJALAN BUKIT BERUANG

 

Result that I want:

ADDR1 Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGJALAN BUKIT BERUANG
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATKAMPUNG TERSUSUN SERKAM DARAT
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIATAMAN REMBIA SETIA
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIATAMAN REMBIA SETIA

 

Thank you.

6 REPLIES 6
andreas_lds
Jade | Level 19

Please post both datasets in usable form.

The data you have posted looks like you want an exact match, so sorting the data and using data step with merge should solve the issue.

syazwaan
Calcite | Level 5

Example:

ADDR1Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGKAMPUNG TERSUSUN SERKAM DARAT
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATTAMAN REMBIA SETIA
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIAJALAN BUKIT BERUANG
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIAJALAN BUKIT BERUANG

 

Result:

ADDR1Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGJALAN BUKIT BERUANG
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATKAMPUNG TERSUSUN SERKAM DARAT
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIATAMAN REMBIA SETIA
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIATAMAN REMBIA SETIA
Kurt_Bremser
Super User

Simple SQL join:

proc sql;
create table want as
  select
    a.ADDR1,
    b.Locality_Name
  from have1 a left join have2 b
  on a.ADDR1 = b.Locality_Name
;
quit;
syazwaan
Calcite | Level 5

I have 2 datasets (Data and Locality). I want to match string between column 'ADDR1' in first dataset (Data) with 'Locality Name' in second dataset (Locality). The 'Data' dataset contain 800,000+ rows and the 'Locality' dataset contain 2,000+ rows.

 

Example:

ADDR1Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGKAMPUNG TERSUSUN SERKAM DARAT
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATTAMAN REMBIA SETIA
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIAJALAN BUKIT BERUANG
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIAJALAN BUKIT BERUANG

 

Result that I want:

ADDR1 Locality Name
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANGJALAN BUKIT BERUANG
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARATKAMPUNG TERSUSUN SERKAM DARAT
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIATAMAN REMBIA SETIA
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIATAMAN REMBIA SETIA

 

Thank you.

PGStats
Opal | Level 21

You could get the simplest matches with the CONTAINS operator:

 

proc sql;
create table result as
select 
	a.addr1,
	b.localityName
from
	data as a left join
	locality as b on a.addr1 contains strip(b.localityName);
quit;
PG
andreas_lds
Jade | Level 19

@syazwaan: i merged your new message titled "String Matching" with the older message. Please don't double post questions.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1360 views
  • 0 likes
  • 4 in conversation