BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JH74
Obsidian | Level 7

Hello.  I'm not sure if this is even possible but thought I'd check.  I have 2 datasets.  They both have multiple variables, but the one that is similar between the two are address variables, and those are the variables I need to try and match. 

 

This issue is, one set has values such as:

193 FAIRVIEW LN
1006 NUT TREE ROAD
6121 PASEO DEL NORTE
9401 PAINTER AVE
3744 LONG BEACH BLVD
11801 PIERCE ST FL 

 

While the other has values similar to this:

193 FAIRVIEW LN STE 100
1006 NUT TREE ROAD APT 5
6121 PASEO DEL NORTE BLDNG 2
9401 PAINTER AVE SUITE 2A
3744 LONG BEACH BLVD SUITE 224
11801 PIERCE ST FL APT 315

 

I want to create a format from one dataset and use that format on the other data set to create a new variable (say addr_match), where if theres a match, then addr_match = 1.  

 

Is there a way to match these variables where if part of the addresses match then it'd be addr_match = 1, or do the values in the address fields need to be exactly the same?  There are thousands of rows in each dataset.

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

To create a dataset of the matched results just add a CREATE TABLE to the query:

proc sql;
  create table Want as
  select h1.addr_short, h2.addr_long
  from have1 h1 inner join have2 h2
  on h2.addr_long like cats(h1.addr_short,'%')
  ;
quit;

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

Matching addresses is complicated. In fact SAS has aproduct designed to do this  - SAS Data Quality. The question I have is how do you know the examples you provide are indeed the same addresses? If you are validating these addresses for postal purposes, then an apartment number at a street address is not the same as the actual street address itself. If all you are required to do is match the street address, that might be good enough.

ballardw
Super User

Provide some example data in the form of data steps and perhaps you'll get some example code.

 

If you values are that "clean", not having to match "LN" to "LANE" for example it may be quite doable.

Format per se may not be the best approach though. Or you would have to code a function and then create a format using that function and may be more work that some sort of Join with a scoring function such as Verify or COMGED

Patrick
Opal | Level 21

Your sample data indicates that the shorter string is exactly contained in the longer string. If that's representative for your real data then code along the line of below could work. 

data have1;
  infile datalines truncover;
  input addr_short $100.;
  datalines;
193 FAIRVIEW LN
1006 NUT TREE ROAD
6121 PASEO DEL NORTE
9401 PAINTER AVE
3744 LONG BEACH BLVD
11801 PIERCE ST FL 
;
data have2;
  infile datalines truncover;
  input addr_long $150.;
  datalines;
193 FAIRVIEW LN STE 100
1006 NUT TREE ROAD APT 5
6121 PASEO DEL NORTE BLDNG 2
9401 PAINTER AVE SUITE 2A
3744 LONG BEACH BLVD SUITE 224
11801 PIERCE ST FL APT 315
;

/* using SAS find() function */
proc sql;
  select h1.addr_short, h2.addr_long
  from have1 h1 inner join have2 h2
  on find(h2.addr_long,strip(h1.addr_short))>0
  ;
quit;

/* using SQL like operator */
proc sql;
  select h1.addr_short, h2.addr_long
  from have1 h1 inner join have2 h2
  on h2.addr_long like cats(h1.addr_short,'%')
  ;
quit;

Patrick_0-1694222820971.png

 

JH74
Obsidian | Level 7

Thank you for this.  I've ran this and I do get results showing matches.  I want to output all matching values into a new dataset (move values from the h1 dataset that match the h2 values).  Using the code you provided, is there a step I can add to output the results?  Or can I do this in a different way that can output a new dataset that contains the matches?

SASKiwi
PROC Star

To create a dataset of the matched results just add a CREATE TABLE to the query:

proc sql;
  create table Want as
  select h1.addr_short, h2.addr_long
  from have1 h1 inner join have2 h2
  on h2.addr_long like cats(h1.addr_short,'%')
  ;
quit;
HB
Barite | Level 11 HB
Barite | Level 11
Shorter string in a longer string
---h2.addr_long like cats(h1.addr_short,'%')
Worth the price of admission right there.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 907 views
  • 4 likes
  • 5 in conversation