BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MR_E
Obsidian | Level 7

Morning World,

 

i am looking to optimise my code by reducing the statement below by removing the additional array match statement(/* before city name street match*/), which i believe can be put into another (/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/) so that it can all be done on the STREET MATCH variable rather than that and BEFORE MATCH

 

The final result gives me what i need - identify an actual city, identify when its actually a street (followed by STREET/ Road etc) and taking into consideration when the street name is before (RUE DE PARIS , Swaziland for example)

 

The code works, but is much slower by adding in the regex before city name clause (worked fast on just city name street clause) - which i expect, but i want to obviously make this optimal. Also i expect more examples than rue de ... so i will need to build this in better also

 

%let rue = rue de;

data CITY_FINAL

array regex (&num_recs) $200 _temporary_;

array CITY_ARRAY (&num_recs) $200 _temporary_;

array regex_before (&num_recs) $200 _temporary_;

* load cities names into list;

if _n_=1 then

do p=1 to nobs;

set CITIES point=p nobs=nobs;

* Build REGEX for this "City_name with street/road/avenue";

regex(p)=cats("/\b(",City_name,")( st| ro| av)/i");

/* Build CITY_ARRAY for "City_name" */

CITY_ARRAY(p)=City_name;

/* Build array for instances where the street name is before the city name i.e rue de "City_name" .*/

regex_before(p)=cats("/\b(&rue )(",City_name,")/i");

 

end;

* keep matches;

set test_data;

/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/

STREET_MATCH = 'N';

found=0;

do i=1 to nobs until(found);

found=prxmatch(regex(i),ADDRESS);

end;

if found THEN

DO;

STREET_MATCH='Y';

end;

/*ADD A MARKER TO IDENTIFY IF THERE IS A MATCH TO THE LISTED CITITES*/

CITY_MATCH = 'N';

/*RUN THROUGH THE ARRAY*/

DO j=1 TO dim(CITY_array);

/*CHECK AGAINST ADDRESS FIELDS FOR MATCH - IS THE CITY PART OF THE ADDRESS FIELD. USE UPPER CASE ON BOTH VARIABLES */

IF indexw(upcase(ADDRESS), upcase(CITY_ARRAY[j])) THEN

DO;

CITY_MATCH='Y';

LEAVE;

END;

end;

/* before city name street match*/

before_MATCH = 'N';

found2=0;

do k=1 to nobs until(found2);

found2=prxmatch(regex_before(k),ADDRESS);

end;

if found2 THEN

DO;

before_MATCH='Y';

end;

run;

 

TEST SAMPLES IF IT HELPS

 

DATA CITIES;

input City_name $char20.;

datalines;

KINGSLANDING

MORDOR

RIVENDELL

WINTERFELL

;

run;

DATA test_data;

input ADDRESS $char80.;

datalines;

HAMBURG

KINGSLANDING

GENEVA

paris

birmingham

Zurich

SINGAPORE

MORDOR - CAN YOU STILL IDENTIFY ME?

RIVENDELL 01 STILL TESTING WTH ADD TXT

TEST MORDOR ST

247 WINTERFELL STREET

MORDOR st

sMORDOR st

MORRDOR road

rue de RIVENDELL

;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
MR_E
Obsidian | Level 7

this seems to work , but looks slight different - everything look ok from your POV? @gamotte

 

regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");

View solution in original post

6 REPLIES 6
AhmedAl_Attar
Ammonite | Level 13

Here is an idea you can try,

 

Replace your Arrays with Hash Objects, and Hash Iterators http://www.lexjansen.com/search/searchresults.php?q=hash%20Object

The Hash Object will provide you with functions to add(), and find(), and this will avoid/replace your Array Looping.

 

Just a different programming approach, hope it helps,

Ahmed

gamotte
Rhodochrosite | Level 12
Hello,


regex(p)=cats("/\b(&rue ",City_name,"|",City_name,"( st| ro| av))/i");
MR_E
Obsidian | Level 7

Hi @gamotte ,

 

I did actually come across this too, however it did not solve my issue as the Rue de Rivendell still is not identified:(attached results)

 

performance wise it runs great, as it did without the before city name array and do loop before, but unfortunately did not identify Rue de

 

 


rivendell.PNG
MR_E
Obsidian | Level 7

this seems to work , but looks slight different - everything look ok from your POV? @gamotte

 

regex(p)=cats("/\b(&rue )(",City_name,")|\b(",City_name,")( st| ro| av)/i");

gamotte
Rhodochrosite | Level 12
I have left work si I can not test but I suspect my answer does not work because of some blanks. Try to remove the leading blank in the définition of &rue.
gamotte
Rhodochrosite | Level 12

The cats function removed the space after &rue. This should work :

 

regex(p)=cats("/\b(&rue.\s",City_name,"|",City_name,"( st| ro| av))/i");

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2647 views
  • 1 like
  • 3 in conversation