BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MR_E
Obsidian | Level 7

Morning World,

 

i am looking to optimise my code by reducing the statement below by removing the additional array match statement(/* before city name street match*/), which i believe can be put into another (/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/) so that it can all be done on the STREET MATCH variable rather than that and BEFORE MATCH

 

The final result gives me what i need - identify an actual city, identify when its actually a street (followed by STREET/ Road etc) and taking into consideration when the street name is before (RUE DE PARIS , Swaziland for example)

 

The code works, but is much slower by adding in the regex before city name clause (worked fast on just city name street clause) - which i expect, but i want to obviously make this optimal. Also i expect more examples than rue de ... so i will need to build this in better also

 

%let rue = rue de;

data CITY_FINAL

array regex (&num_recs) $200 _temporary_;

array CITY_ARRAY (&num_recs) $200 _temporary_;

array regex_before (&num_recs) $200 _temporary_;

* load cities names into list;

if _n_=1 then

do p=1 to nobs;

set CITIES point=p nobs=nobs;

* Build REGEX for this "City_name with street/road/avenue";

regex(p)=cats("/\b(",City_name,")( st| ro| av)/i");

/* Build CITY_ARRAY for "City_name" */

CITY_ARRAY(p)=City_name;

/* Build array for instances where the street name is before the city name i.e rue de "City_name" .*/

regex_before(p)=cats("/\b(&rue )(",City_name,")/i");

 

end;

* keep matches;

set test_data;

/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/

STREET_MATCH = 'N';

found=0;

do i=1 to nobs until(found);

found=prxmatch(regex(i),ADDRESS);

end;

if found THEN

DO;

STREET_MATCH='Y';

end;

/*ADD A MARKER TO IDENTIFY IF THERE IS A MATCH TO THE LISTED CITITES*/

CITY_MATCH = 'N';

/*RUN THROUGH THE ARRAY*/

DO j=1 TO dim(CITY_array);

/*CHECK AGAINST ADDRESS FIELDS FOR MATCH - IS THE CITY PART OF THE ADDRESS FIELD. USE UPPER CASE ON BOTH VARIABLES */

IF indexw(upcase(ADDRESS), upcase(CITY_ARRAY[j])) THEN

DO;

CITY_MATCH='Y';

LEAVE;

END;

end;

/* before city name street match*/

before_MATCH = 'N';

found2=0;

do k=1 to nobs until(found2);

found2=prxmatch(regex_before(k),ADDRESS);

end;

if found2 THEN

DO;

before_MATCH='Y';

end;

run;

 

TEST SAMPLES IF IT HELPS

 

DATA CITIES;

input City_name $char20.;

datalines;

KINGSLANDING

MORDOR

RIVENDELL

WINTERFELL

;

run;

DATA test_data;

input ADDRESS $char80.;

datalines;

HAMBURG

KINGSLANDING

GENEVA

paris

birmingham

Zurich

SINGAPORE

MORDOR - CAN YOU STILL IDENTIFY ME?

RIVENDELL 01 STILL TESTING WTH ADD TXT

TEST MORDOR ST

247 WINTERFELL STREET

MORDOR st

sMORDOR st

MORRDOR road

rue de RIVENDELL

;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
MR_E
Obsidian | Level 7

this seems to work , but looks slight different - everything look ok from your POV? @gamotte

 

regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");

View solution in original post

6 REPLIES 6
AhmedAl_Attar
Rhodochrosite | Level 12

Here is an idea you can try,

 

Replace your Arrays with Hash Objects, and Hash Iterators http://www.lexjansen.com/search/searchresults.php?q=hash%20Object

The Hash Object will provide you with functions to add(), and find(), and this will avoid/replace your Array Looping.

 

Just a different programming approach, hope it helps,

Ahmed

gamotte
Rhodochrosite | Level 12
Hello,


regex(p)=cats("/\b(&rue ",City_name,"|",City_name,"( st| ro| av))/i");
MR_E
Obsidian | Level 7

Hi @gamotte ,

 

I did actually come across this too, however it did not solve my issue as the Rue de Rivendell still is not identified:(attached results)

 

performance wise it runs great, as it did without the before city name array and do loop before, but unfortunately did not identify Rue de

 

 


rivendell.PNG
MR_E
Obsidian | Level 7

this seems to work , but looks slight different - everything look ok from your POV? @gamotte

 

regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");

gamotte
Rhodochrosite | Level 12
I have left work si I can not test but I suspect my answer does not work because of some blanks. Try to remove the leading blank in the définition of &rue.
gamotte
Rhodochrosite | Level 12

The cats function removed the space after &rue. This should work :

 

regex(p)=cats("/\b(&rue.\s",City_name,"|",City_name,"( st| ro| av))/i");

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1379 views
  • 1 like
  • 3 in conversation