Morning World,
i am looking to optimise my code by reducing the statement below by removing the additional array match statement(/* before city name street match*/), which i believe can be put into another (/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/) so that it can all be done on the STREET MATCH variable rather than that and BEFORE MATCH
The final result gives me what i need - identify an actual city, identify when its actually a street (followed by STREET/ Road etc) and taking into consideration when the street name is before (RUE DE PARIS , Swaziland for example)
The code works, but is much slower by adding in the regex before city name clause (worked fast on just city name street clause) - which i expect, but i want to obviously make this optimal. Also i expect more examples than rue de ... so i will need to build this in better also
%let rue = rue de;
data CITY_FINAL
array regex (&num_recs) $200 _temporary_;
array CITY_ARRAY (&num_recs) $200 _temporary_;
array regex_before (&num_recs) $200 _temporary_;
* load cities names into list;
if _n_=1 then
do p=1 to nobs;
set CITIES point=p nobs=nobs;
* Build REGEX for this "City_name with street/road/avenue";
regex(p)=cats("/\b(",City_name,")( st| ro| av)/i");
/* Build CITY_ARRAY for "City_name" */
CITY_ARRAY(p)=City_name;
/* Build array for instances where the street name is before the city name i.e rue de "City_name" .*/
regex_before(p)=cats("/\b(&rue )(",City_name,")/i");
end;
* keep matches;
set test_data;
/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/
STREET_MATCH = 'N';
found=0;
do i=1 to nobs until(found);
found=prxmatch(regex(i),ADDRESS);
end;
if found THEN
DO;
STREET_MATCH='Y';
end;
/*ADD A MARKER TO IDENTIFY IF THERE IS A MATCH TO THE LISTED CITITES*/
CITY_MATCH = 'N';
/*RUN THROUGH THE ARRAY*/
DO j=1 TO dim(CITY_array);
/*CHECK AGAINST ADDRESS FIELDS FOR MATCH - IS THE CITY PART OF THE ADDRESS FIELD. USE UPPER CASE ON BOTH VARIABLES */
IF indexw(upcase(ADDRESS), upcase(CITY_ARRAY[j])) THEN
DO;
CITY_MATCH='Y';
LEAVE;
END;
end;
/* before city name street match*/
before_MATCH = 'N';
found2=0;
do k=1 to nobs until(found2);
found2=prxmatch(regex_before(k),ADDRESS);
end;
if found2 THEN
DO;
before_MATCH='Y';
end;
run;
TEST SAMPLES IF IT HELPS
DATA CITIES;
input City_name $char20.;
datalines;
KINGSLANDING
MORDOR
RIVENDELL
WINTERFELL
;
run;
DATA test_data;
input ADDRESS $char80.;
datalines;
HAMBURG
KINGSLANDING
GENEVA
paris
birmingham
Zurich
SINGAPORE
MORDOR - CAN YOU STILL IDENTIFY ME?
RIVENDELL 01 STILL TESTING WTH ADD TXT
TEST MORDOR ST
247 WINTERFELL STREET
MORDOR st
sMORDOR st
MORRDOR road
rue de RIVENDELL
;
run;
this seems to work , but looks slight different - everything look ok from your POV? @gamotte
regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");
Here is an idea you can try,
Replace your Arrays with Hash Objects, and Hash Iterators http://www.lexjansen.com/search/searchresults.php?q=hash%20Object
The Hash Object will provide you with functions to add(), and find(), and this will avoid/replace your Array Looping.
Just a different programming approach, hope it helps,
Ahmed
Hi @gamotte ,
I did actually come across this too, however it did not solve my issue as the Rue de Rivendell still is not identified:(attached results)
performance wise it runs great, as it did without the before city name array and do loop before, but unfortunately did not identify Rue de
this seems to work , but looks slight different - everything look ok from your POV? @gamotte
regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");
The cats function removed the space after &rue. This should work :
regex(p)=cats("/\b(&rue.\s",City_name,"|",City_name,"( st| ro| av))/i");
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.