I have many fields that are like in this example here address = 'Lot B Plan 1234' however, I need to check if this matches the same field where it would be incased in address = 'PID 004-411-439 Lot B Plan 1234'.. but the problem is there are many scenarios like this so could be 1000s. So same field name but if 1 entry is encased in another entry then it would trigger a match, else no match.. could be 10000's rows and 1000s different scenarios. Is this dooable ?
If this is just a simple case where a given string is potentially an exact substring of another string contained in the same column, then I would just do a full join of the data to itself using the LIKE operator in SQL. Something like this:
proc sql;
create table matches as
select a.address as address_sub, b.address
from
indata A
left join
indata B
on b.address^=a.address and b.address like compress("%" || a.address || "%")
order by a.address
quit;
...but LIKE can be very slow. If you have something more complex, you will need regex or something else. Are the data in a SAS dataset, on a SQL RDBMS or somewhere else? SAS would not necessarily be my first choice for this.
Do a full self join and filter:
data have;
input address $40.;
datalines;
Lot B Plan 1234
PID 004-411-439 Lot B Plan 1234
;
proc sql;
create table want as
select
a.address as address_a,
b.address as address_b
from
have a, have b
where
index(a.address,strip(b.address)) > 0 and a.address ne address_b
;
quit;
Thank you
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.