BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MR_E
Obsidian | Level 7

Evening All,

 

I have an issue in a large piece of code where the array is not loading all values, it only creates the last possible one, meaning the rest of the array is not creating all possible values for me to match on with INDEX later on.

 

so here is an example:

 

DATA cities1;

input city_name $char50.;

datalines;

HAMBURG

Aleppo

GENEVA

paris

birmingham

Zurich

Coventry

SINGAPORE

PYONGYANG

;

run;

DATA suflist1;

input suffix $char50.;

datalines;

street

road

avenue

port

;

run;

 

data test;

array CITY_ARRAY (&num_recs) $200 _temporary_;

array SUFLIST (&num_recs)$200 _temporary_;

 

* load cities names into list;

if _n_=1 then

do p=1 to nobs;

SET SUFLIST1 point=p nobs=nobs;

SUFLIST(p)=SUFFIX;

set CITIES1 point=p nobs=nobs;

/* Build CITY_ARRAY for "City_name" */

CITY_ARRAY(p)=City_name;

end;

do i=1 to nobs;

name = upcase(catx(" ",(CITY_ARRAY[i]),(SUFLIST[i])));

end;

run;

 

So i want Name to show me the value obviously, so it gives me evidence that all values have been considered, so i would expect to see Hamburg Street , Hamburg Road all the way down to Pyongpang Port, however all i see is the latter, whats happened to the rest of the values? What have i crucially missed off?

 

As mentioned this is part of a bigger matching process with hundreds of values in the datastes im refering too, so a workaround with each individual variable in the query is not feasible

 

much appreciated

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

Hello,

 

You don't have to test for all possible combinations. If there is no city match then you won't have street match either.

You can test the city match first and if there is a match, check the street types against the matched city :

 

data test;
infile datalines truncover;
input address $100.;
datalines;
10 Hamburg street Rotterdam
5 Wilhelmstrasse Hamburg
;
run;

data want;
    array CITY_ARRAY (&num_cities.) $200 _temporary_;
    array SUFLIST (&num_suffixes.)$200 _temporary_;

    * load cities names into list;
    if _n_=1 then do;
        do p=1 to nobs;
            SET SUFLIST1 point=p nobs=nobs;
            SUFLIST(p)=SUFFIX;
        end;
        do p=1 to nobs2;
            set CITIES1 point=p nobs=nobs2;
            /* Build CITY_ARRAY for "City_name" */
            CITY_ARRAY(p)=City_name;
        end;
    end;

    set test;

    CITY_MATCH="N";
    STREET_MATCH="N";

    do i=1 to nobs2;
        City=upcase(CITY_ARRAY(i));

        found=indexw(upcase(address),City);

        if (found) then do;
            CITY_MATCH="Y";

            do j=1 to nobs;
                StreetType=upcase(SUFLIST(j));

				sourceString=upcase(address);
                searchString=catx(" ",City,StreetType);

				found2=indexw(sourceString,searchString);

                put sourceString= city= streetType= searchString= found2=;

                if found2 then do;
                    STREET_MATCH="Y";
                    leave;
                end;
            end;

            leave;
        end;
    end;
run;

View solution in original post

7 REPLIES 7
MR_E
Obsidian | Level 7

actual code where the issue is if it helps:

where ALL_Address is the the address fields being matched against on the LARGE99 Data set

 

FYI THE CITY MATCH works fine as it should, but the CITY Match is the pain, where i believe this array is not performing as it should...

 

data CITY_test_data (DROP=City_name Country ref) NO_MATCH (DROP=City_name Country ref) POSSIBLE_STREET (DROP=City_name Country ref);

 

array CITY_ARRAY (&num_recs) $200 _temporary_;

array SUFLIST (&num_recs)$200 _temporary_;

 

* load cities names into list;

if _n_=1 then

do p=1 to nobs;

SET SUFLIST point=p nobs=nobs;

SUFLIST(p)=SUFFIX;

set CITIES point=p nobs=nobs;

/* Build CITY_ARRAY for "City_name" */

CITY_ARRAY(p)=City_name;

end;

* match against test data;

set LARGE99_data_PREP;

/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/

STREET_MATCH = 'N';

found=0;

 

do i=1 to nobs until(found);

 

found=indexw(upcase(ALL_ADDRESS), upcase(catx(" ",(CITY_ARRAY[i]),(SUFLIST[i]))));

end;

if found THEN

DO;

STREET_MATCH='Y';

end;

end;

 

/*ADD A MARKER TO IDENTIFY IF THERE IS A MATCH TO THE LISTED CITITES*/

CITY_MATCH = 'N';

/*RUN THROUGH THE ARRAY*/

DO j=1 TO dim(CITY_array);

/*CHECK AGAINST ADDRESS FIELDS FOR MATCH - IS THE CITY PART OF THE ADDRESS FIELD. USE UPPER CASE ON BOTH VARIABLES */

 

IF indexw(upcase(ALL_ADDRESS), upcase(CITY_ARRAY[j])) THEN

DO;

CITY_MATCH='Y';

LEAVE;

END;

end;

 

IF CITY_MATCH = 'Y' AND STREET_MATCH = 'N' THEN OUTPUT CITY_test_data;

IF CITY_MATCH = 'N' AND STREET_MATCH = 'N' THEN OUTPUT NO_MATCH;

IF CITY_MATCH = 'Y' AND STREET_MATCH = 'Y' THEN OUTPUT POSSIBLE_STREET;

run;

ballardw
Super User

Where are you going to get a value for &Num_RECS? Code will fail without a value.

 

SET SUFLIST1 point=p nobs=nobs;

SUFLIST(p)=SUFFIX;

set CITIES1 point=p nobs=nobs;

 

Uses the same variables for POINT AND NOBS. REAL bad idea to reuse variable names from different sets as are you sure which one is being used where?

 

You really need to display what you think the results for this exercise are supposed to be as I suspect you are making it much more difficult than needed.

gamotte
Rhodochrosite | Level 12

If i understand your problem correctly,

- you have different numbers of records in your Cities1 and Suflist1 datasets so you have to use two macrovariables

and two variables for the nobs parameter,

- you have to use nested loops to generate all possible combinations

 

DATA cities1;
    input city_name $char50.;
	call symput("num_cities",_N_);
    datalines;
HAMBURG
Aleppo
GENEVA
paris
birmingham
Zurich
Coventry
SINGAPORE
PYONGYANG
;
run;
DATA suflist1;
    input suffix $char50.;
	call symput("num_suffixes",_N_);
    datalines;
street
road
avenue
port
;
run;
 
data test;
    array CITY_ARRAY (&num_cities.) $200 _temporary_;
    array SUFLIST (&num_suffixes.)$200 _temporary_;

    * load cities names into list;
    if _n_=1 then do;
        do p=1 to nobs;
            SET SUFLIST1 point=p nobs=nobs;
            SUFLIST(p)=SUFFIX;
		end;
		do p=1 to nobs2;
            set CITIES1 point=p nobs=nobs2;
            /* Build CITY_ARRAY for "City_name" */
            CITY_ARRAY(p)=City_name;
        end;
        do i=1 to nobs;
			do j=1 to nobs2;
	            name = upcase(catx(" ",(CITY_ARRAY[j]),(SUFLIST[i])));
				output;
			end;
        end;
	end;
run;
MR_E
Obsidian | Level 7

@gamotte you did understand correctly and it worked as i wanted on the dummy data thankyou, great stuff.

 

However when applying this logic in my full code the data step is taking an age (still running as i write this), could this be because of where i have placed the end statements? or is it down to the fact that &numrecs is 4067 obs and &suf_recs is 206 obs :

 

The end outputs will give me those where it is just a city in the address (i.e Paris (not followed by a suffix), Those which are a street (Paris Street) and those which match neither.

 

This previously worked fine on the city match, and i could prxmatch a few suffixes, but 200 requires something abit more strategic...

 

This is the still running code with your logic applied,

 

data CITY_test_valery_data (DROP=City_name Country ref) NO_MATCH (DROP=City_name Country ref) POSSIBLE_STREET (DROP=City_name Country ref);

/* array regex (&num_recs) $200 _temporary_;*/

array CITY_ARRAY (&num_recs) $200 _temporary_;

array SUFLIST (&suf_recs)$200 _temporary_;

 

* load cities names into list;

if _n_=1 then

do p=1 to nobs;

SET SUFLIST point=p nobs=nobs;

SUFLIST(p)=SUFFIX;

end;

do p=1 to nobs2;

set CITIES point=p nobs=nobs2;

/* Build CITY_ARRAY for "City_name" */

CITY_ARRAY(p)=City_name;

end;

* match against test data;

set LARGE99_data_PREP;

/* DETERMINE IF ITS A MATCH ON A STREET/ROAD/AVENUE NAME*/

STREET_MATCH = 'N';

found=0;

do i=1 to nobs until (found);

do j=1 to nobs2 until(found);

 

what_it_reads = upcase(catx(" ",(CITY_ARRAY[j]),(SUFLIST[i])));

found=indexw(upcase(ALL_ADDRESS), upcase(catx(" ",(CITY_ARRAY[j]),(SUFLIST[i]))));

end; /* are these end statements in the right place */

end; /* are these end statements in the right place */

if found THEN

DO;

STREET_MATCH='Y';

end;

/*ADD A MARKER TO IDENTIFY IF THERE IS A MATCH TO THE LISTED CITITES*/

CITY_MATCH = 'N';

/*RUN THROUGH THE ARRAY*/

DO k=1 TO dim(CITY_array);

/*CHECK AGAINST ADDRESS FIELDS FOR MATCH - IS THE CITY PART OF THE ADDRESS FIELD. USE UPPER CASE ON BOTH VARIABLES */

 

IF indexw(upcase(ALL_ADDRESS), upcase(CITY_ARRAY[k])) THEN

DO;

CITY_MATCH='Y';

LEAVE;

END;

end;

IF CITY_MATCH = 'Y' AND STREET_MATCH = 'N' THEN OUTPUT CITY_test_valery_data;

IF CITY_MATCH = 'N' AND STREET_MATCH = 'N' THEN OUTPUT NO_MATCH;

IF CITY_MATCH = 'Y' AND STREET_MATCH = 'Y' THEN OUTPUT POSSIBLE_STREET;

run;

ballardw
Super User

1) Post code into a code box opened with the {i} icon to preserve formatting

 

2)This code has the potential for never ending

   do i=1 to nobs until (found);
      do j=1 to nobs2 until(found);
          
         what_it_reads = upcase(catx(" ",(CITY_ARRAY[j]),(SUFLIST[i])));
         found=indexw(upcase(ALL_ADDRESS), upcase(catx(" ",(CITY_ARRAY[j]),(SUFLIST[i]))));
      end; /* are these end statements in the right place */
   end; /* are these end statements in the right place */

If the boundary of the city_array and suflist value does not fall on a WORD boundary(a character other than space at begin/end/middle ) in All_address the result of found may be 0 for all records. You may want a simple INDEX. Of if All_address has two or more consecutive spaces in the body then your match may fail. See this example:

 

data example;
   add = " This St,NYC NYC";
   str = "NYC NYC";
   resultw = indexw(add,str);
   put resultw=;
run;

Or this with 2 spaces

data example;
   add = " This St NYC  NYC";
   str = "NYC NYC";
   resultw = indexw(add,str);
   put resultw=;
run;
gamotte
Rhodochrosite | Level 12

Hello,

 

You don't have to test for all possible combinations. If there is no city match then you won't have street match either.

You can test the city match first and if there is a match, check the street types against the matched city :

 

data test;
infile datalines truncover;
input address $100.;
datalines;
10 Hamburg street Rotterdam
5 Wilhelmstrasse Hamburg
;
run;

data want;
    array CITY_ARRAY (&num_cities.) $200 _temporary_;
    array SUFLIST (&num_suffixes.)$200 _temporary_;

    * load cities names into list;
    if _n_=1 then do;
        do p=1 to nobs;
            SET SUFLIST1 point=p nobs=nobs;
            SUFLIST(p)=SUFFIX;
        end;
        do p=1 to nobs2;
            set CITIES1 point=p nobs=nobs2;
            /* Build CITY_ARRAY for "City_name" */
            CITY_ARRAY(p)=City_name;
        end;
    end;

    set test;

    CITY_MATCH="N";
    STREET_MATCH="N";

    do i=1 to nobs2;
        City=upcase(CITY_ARRAY(i));

        found=indexw(upcase(address),City);

        if (found) then do;
            CITY_MATCH="Y";

            do j=1 to nobs;
                StreetType=upcase(SUFLIST(j));

				sourceString=upcase(address);
                searchString=catx(" ",City,StreetType);

				found2=indexw(sourceString,searchString);

                put sourceString= city= streetType= searchString= found2=;

                if found2 then do;
                    STREET_MATCH="Y";
                    leave;
                end;
            end;

            leave;
        end;
    end;
run;
MR_E
Obsidian | Level 7

AWESOME - thank you so much, this is much faster and works as intended - exactly what i was looking for

 

Your help is greatly appreciated, thank you

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 983 views
  • 1 like
  • 3 in conversation