BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

I have many fields that are like in this example here address = 'Lot B Plan 1234' however, I need to check if this matches the same field where it would be incased in  address = 'PID 004-411-439 Lot B Plan 1234'.. but the problem is there are many scenarios like this so could be 1000s.   So same field name but if 1 entry is encased in another entry then it would trigger a match, else no match.. could be 10000's rows and 1000s different scenarios.  Is this dooable ?

4 REPLIES 4
quickbluefish
Barite | Level 11

If this is just a simple case where a given string is potentially an exact substring of another string contained in the same column, then I would just do a full join of the data to itself using the LIKE operator in SQL.  Something like this:

proc sql;
create table matches as
select a.address as address_sub, b.address
from
    indata A
    left join
    indata B
    on b.address^=a.address and b.address like compress("%" || a.address || "%")
order by a.address
quit;
    

...but LIKE can be very slow.  If you have something more complex, you will need regex or something else.  Are the data in a SAS dataset, on a SQL RDBMS or somewhere else?  SAS would not necessarily be my first choice for this.  

Kurt_Bremser
Super User

Do a full self join and filter:

data have;
input address $40.;
datalines;
Lot B Plan 1234
PID 004-411-439 Lot B Plan 1234
;

proc sql;
create table want as
  select
    a.address as address_a,
    b.address as address_b
  from
    have a, have b
  where
    index(a.address,strip(b.address)) > 0 and a.address ne address_b
;
quit;

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 262 views
  • 4 likes
  • 3 in conversation