I have 2 datasets (Data and Locality). I want to match string between column 'ADDR1' in first dataset (Data) with 'Locality Name' in second dataset (Locality). The 'Data' dataset contain 800,000+ rows and the 'Locality' dataset contain 2,000+ rows.
Example:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | KAMPUNG TERSUSUN SERKAM DARAT |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | TAMAN REMBIA SETIA |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
Result that I want:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | JALAN BUKIT BERUANG |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | KAMPUNG TERSUSUN SERKAM DARAT |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
Thank you.
Please post both datasets in usable form.
The data you have posted looks like you want an exact match, so sorting the data and using data step with merge should solve the issue.
Example:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | KAMPUNG TERSUSUN SERKAM DARAT |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | TAMAN REMBIA SETIA |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
Result:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | JALAN BUKIT BERUANG |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | KAMPUNG TERSUSUN SERKAM DARAT |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
Simple SQL join:
proc sql;
create table want as
select
a.ADDR1,
b.Locality_Name
from have1 a left join have2 b
on a.ADDR1 = b.Locality_Name
;
quit;
I have 2 datasets (Data and Locality). I want to match string between column 'ADDR1' in first dataset (Data) with 'Locality Name' in second dataset (Locality). The 'Data' dataset contain 800,000+ rows and the 'Locality' dataset contain 2,000+ rows.
Example:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | KAMPUNG TERSUSUN SERKAM DARAT |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | TAMAN REMBIA SETIA |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | JALAN BUKIT BERUANG |
Result that I want:
ADDR1 | Locality Name |
NO 35 JALAN BUKIT BERUANG TIGA TAMAN BUKIT BERUANG | JALAN BUKIT BERUANG |
LOT 2697 JALAN TIONG KAMPUNG TERSUSUN SERKAM DARAT | KAMPUNG TERSUSUN SERKAM DARAT |
RM 1024 JALAN REMBIA SETIA 5 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
RM 713 JALAN SETIA 3 TAMAN REMBIA SETIA | TAMAN REMBIA SETIA |
Thank you.
You could get the simplest matches with the CONTAINS operator:
proc sql;
create table result as
select
a.addr1,
b.localityName
from
data as a left join
locality as b on a.addr1 contains strip(b.localityName);
quit;
@syazwaan: i merged your new message titled "String Matching" with the older message. Please don't double post questions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.