BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SunnySong
Fluorite | Level 6

Dear all, 

 

I am doing some address data cleaning for my company. I managed to get the job done but I am still not satisfied as I know the script can be even simpler. Below is what I have achieved. I used values in street_name table to search in the address column in dataset (address). If there is a match the script will give me a value 1. BTW, the script is copied from our forum. The script works perfectly fine. But my curiosity pushes to go one step further - adding it in a macro statement. So when I run %addr_clean(macro_nm, 1)  it will give me the same result. However, I am stuck at converting array syntax to a macro-understanding format. Please help me out. 

 

DATA STREET_NAME;
INPUT ST_NM $ TYP 4.;
DATALINES;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
RUN;
 
DATA ADDRESS;
INPUT ADDRESS $20.;
DATALINES;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
RUN;
 
PROC SQL NOPRINT;
SELECT "'" || " "|| STRIP(ST_NM) || "'" INTO: STREET SEPARATED BY ', ' FROM STREET_NAME WHERE TYP = 1.;
QUIT;
%PUT &STREET.;
 
DATA STREET_ADDRESS;
   SET ADDRESS;
   ARRAY NN (%sysfunc(countw(QUOTE(&STREET.), %STR(,)))) $ 10 _TEMPORARY_ (&STREET.);
   DO I=1 TO DIM(NN);
      IF FIND(ADDRESS,STRIP(NN[i]),'i') GT 0 THEN DO;
         FOUND=1;
         MATCH=NN[i];
         LEAVE;
    END;
END;
   DROP I;
RUN;
=========================================================
%MACRO ADDR_CLEAN(ST_MACRO, TYP);
PROC SQL NOPRINT;
SELECT "'" || " "|| STRIP(ST_NM) || "'" INTO: &ST_MACRO. SEPARATED BY ', ' FROM STREET_NAME WHERE TYP =&TYP..;
QUIT;
/*I am stuck here and not able to do the rest to fit the code in a macro statement*/;
%MEND;
%ADDR_CLEAN(ST, TYP);

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kathryn_SAS
SAS Employee

Macro is a text processor. To find the strings using macro logic, you need to remove the quotes. Here is some code to get you started.

%MACRO ADDR_CLEAN(ST_MACRO, TYP);
PROC SQL NOPRINT;
SELECT STRIP(ST_NM) INTO: &ST_MACRO. SEPARATED BY ',' FROM STREET_NAME 
 WHERE TYP =&TYP;
QUIT;
/*I am stuck here and not able to do the rest to fit the code in a macro statement*/;
%put &st;
%let count=&sqlobs;
%put &count;

data street_address;
set address;
%do i=1 %to &count;
 %let x=%scan(%bquote(&st),&i,%str(,));
 if find(address,"&x",'i') gt 0 then found=1;
%end;
run;

proc print data=street_address;
run;

%MEND;
%ADDR_CLEAN(ST, 1);

View solution in original post

13 REPLIES 13
Kathryn_SAS
SAS Employee

Macro is a text processor. To find the strings using macro logic, you need to remove the quotes. Here is some code to get you started.

%MACRO ADDR_CLEAN(ST_MACRO, TYP);
PROC SQL NOPRINT;
SELECT STRIP(ST_NM) INTO: &ST_MACRO. SEPARATED BY ',' FROM STREET_NAME 
 WHERE TYP =&TYP;
QUIT;
/*I am stuck here and not able to do the rest to fit the code in a macro statement*/;
%put &st;
%let count=&sqlobs;
%put &count;

data street_address;
set address;
%do i=1 %to &count;
 %let x=%scan(%bquote(&st),&i,%str(,));
 if find(address,"&x",'i') gt 0 then found=1;
%end;
run;

proc print data=street_address;
run;

%MEND;
%ADDR_CLEAN(ST, 1);
SunnySong
Fluorite | Level 6

Hi, Kathryn, 

 

Thank you so much for your in-need support. I was quite busy with an urgent project and didn't have a chance to confirm your solution. Yeah, your code is a lot simpler than mine and it works perfectly. 

 

Quoting and unquoting sometime are really confusing, thanks for valuable advice. I am truly grateful. Your help definitely saved me a lot of time reasearching. You know this is my first post in SAS community and I feel like I am welcomed by our prompt reply. 

 

I will talk to you again when next question comes. LOL. 

 

Regards

 

Sunny 

 

 

 

 

Tom
Super User Tom
Super User

Before you try to use macro logic to generate a SAS program make sure you know what SAS program you want to generate.

 

It looks to me like you are trying to generate one of these three statements:

array nn [6] _temporary_ ('ROAD' 'RD' 'STREET' 'ST' 'PLACE' 'PL');
array nn [1] _temporary_ ('PO_BOX');
array nn [2] _temporary_ ('UNIT' 'SUITE');

The part that varies is the count and the space delimited list of quoted words.

So modify your SQL query so that it generates those two values into macro variables.

proc sql noprint;
select quote(trim(st_nm),"'") into :streetlist separated by ' '
  from street_name
  where typ = &typ
;
%let nstreet=&sqlobs;
quit;

Now you can use NSTREET and STREETLIST macro variables to generate the array statement.

array nn [&nstreet] _temporary_ (&streetlist);

Things to note.

  • PROC SQL already counts the number of observations returned by a query.  So use that instead of trying to re-count them later.
  • Use the QUOTE() function to add quotes around values. It will properly double up any embedded quotes.
  • Use single quotes when generating quoted strings so that any & or % characters in the values are not interpreted by the macro processor.
  • The list of initial values for an array does not need to use commas as the delimiter, spaces work just the same.  And since macro variables with commas in them cause trouble when passed to macro calls or macro function calls it is better to just use spaces.

 

Also note that you should not use STRIP() when generating quoted values from character variables.  The resulting quoted string will not have the leading spaces and so will not accurately reflect the value that was in the original character variable.  Instead use TRIM() to remove only the  trailing spaces.  Not having trailing spaces in the quoted value will not cause an issue in comparison since SAS ignores trailing spaces when comparing strings.   Perhaps not actually wanted in this case since your probably actually do want to remove any leading spaces that might have gotten into the TYP variable. 

 

Also note that if you call the macro like you did:

%ADDR_CLEAN(ST, TYP);

with the second argument set to the string TYP then all of the street types will be included.

 

That is because the resulting where clause of 

where typ = typ

will be true for every observation of the STREET_NAME dataset.

 

SunnySong
Fluorite | Level 6

Hi, Tom, 

 

Thanks for your reply. It is such an elegant piece of code and so easy to understand. With your help, I have had a better understanding of Array. I am a newbie to Array field. But with your demostration, I feel like I am also able to handle this method.

 

 As I said earlier, this is my first post and thank you so much to help me out. I am so thankful. I wish I could learn and grow in this community with great support from those smart people like you and Kathyn. 

 

Once again, thank your for lending me a helping hend when I needed the most. 

 

Keep in touch. 

 

Regards

 

Sunny Song

 

 

SunnySong
Fluorite | Level 6
How come I can only take one post as a solution? You have truly helped me as well.
Tom
Super User Tom
Super User

@SunnySong wrote:
How come I can only take one post as a solution? You have truly helped me as well.

That is how the software tool SAS is using to run the forum works.

 

But it does not matter, the main thing is to make sure and select an answer so future users that are searching for the same type of question will know the thread has an answer and so might be useful for them.

FreelanceReinh
Jade | Level 19

@SunnySong wrote:
How come I can only take one post as a solution? You have truly helped me as well.

I'm going to give Tom's post an extra like to compensate for it not being the accepted solution. :-)

dxiao2017
Pyrite | Level 9

Hi @SunnySong , would prxmatch make the answer easier? The code and results are as follows.

 

However, the loc1=; statement does not work because it seems that the &strlist  macro was not resolved. The loc=; statement worked because I copied the value of the &strlist into the statement manually. My furthur question is: how to make the loc1=prxmatch('/(&strlist)/',address); work, i.e., resolve the &strlist macro properly?

 

data street_name;
   input st_nm $ typ 4.;
   datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
data address;
   input address $20.;
   datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc sql noprint;
select st_nm
   into :strlist separated by '|'
   from street_name
   where typ=1;
quit;
%put &strlist;
/*ROAD|RD|STREET|ST|PLACE|PL*/
data str_address;
   set address;
   loc=prxmatch('/(ROAD|RD|STREET|ST|PLACE|PL)/',address);
   if loc^=0 then found=1;
   loc1=prxmatch('/(&strlist)/',address);
run;
proc print data=str_address;run;

dxiao2017_1-1749737614031.png

SAS help cars; we are cars; that is why my default image;
Tom
Super User Tom
Super User

In general new questions should be raised on a new thread.  You can always include a link back to the old question for context.

 

Note that the macro processors ignore strings bounded on the outside by single quotes.  Use double quotes if you want the macro processor to process the content.

loc1=prxmatch("/(&strlist)/",address);
dxiao2017
Pyrite | Level 9

Hi @Tom thanks for correct the syntax! Change the single quotation mark in prxmatch('//'); was the only thing that I did not try (I even tried something like prxmatch('/("&list")/') because I was not familar with perl (prxmatch) syntax and mistakenly believe that the single quotation mark was a fixed syntax in prxmatch() statement.

 

So now the most efficient and succinct answer (no array, no macro needed) for the question is as follows, and hi @SunnySong , what do you think😀?

 

data street_name;
   input st_nm $ typ 4.;
   datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
data address;
   input address $20.;
   datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc sql noprint;
select st_nm
   into :strlist separated by '|'
   from street_name
   where typ=1;
quit;
data str_address;
   set address;
   loc=prxmatch("/(&strlist)/",address);
   if loc^=0 then found=1;
run;
proc print data=str_address;run;

dxiao2017_0-1749823242077.png

SAS help cars; we are cars; that is why my default image;
SunnySong
Fluorite | Level 6

Hi, dxiao2017, 

 

Thanks for sharing this easy looking but quite advanced code. I do know prxmatch can simplify these searching multiple key words from a long list of strings. I myself even downloaded a tutorial book to learn. But it is not that easy. I only learned some basic knowledge. 

 

Regarding how to get loc1 values, it is in the sql statement. 

 

proc sql noprint;
select st_nm
   into :strlist separated by '|'
   from street_name
   where typ=2;
quit;

Then we can wrap your whole code up with a macro statement %macro street_clean (type). Giving the type value 1, 2, 3, will assign different list of streets. Also Loc0 or Loc1 needs to be changed to Loc&type. to ensure the data results won't be overwritten. 

 

I will test it when I go home and update you. 

 

Once again, thanks for your sharing. Very inspiring. 

 

Regards

 

Sunny Song

dxiao2017
Pyrite | Level 9

Hi @SunnySong , thank you for the feedback! I am glad that I can contribute and help a bit 😀 and best luck in testing your code (the macro you are going to write)!

SAS help cars; we are cars; that is why my default image;
dxiao2017
Pyrite | Level 9

Hi @SunnySong , I think this is the macro you are going to write. "PO_BOX" in the first dataset became "PO BOX" in the second dataset, so the value of the new column in the final output table is missing (it would be "3" if the "PO_BOX" value was the same in the first and second dataset). If you want to deal with this issue (the value in the second dataset is different from those in the first dataset) you need more coding step. You do not need any parameters for the macro. The macro can be simplified a bit because what you want is just the last output table. When finalize the macro do not forget to add the %local statement.

data street_name;
   input st_nm $ typ 4.;
   datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
proc print data=street_name;run;
data address;
   input address $20.;
   datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc print data=address;run;
%macro addressloc;
proc sql;
select distinct typ
   into :typlist
   from street_name;
quit;
%put &typlist &sqlobs;
%let n=&sqlobs;
%put &n;
%do i=1 %to &n;
proc sql;
select st_nm
   into :strlist separated by '|'
   from street_name
   where typ=&i;
quit;
%put &strlist;
data str_address;
   set address;
   set str_address;
   loc=prxmatch("/(&strlist)/",address);
   if loc^=0 then found=&i;
run;
proc print data=str_address;run;
%end;
%mend addressloc;
%addressloc;

dxiao2017_0-1750344114932.png

 

dxiao2017_1-1750344179778.png

 

dxiao2017_2-1750344249553.png

 

dxiao2017_3-1750344308982.png

 

SAS help cars; we are cars; that is why my default image;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1195 views
  • 5 likes
  • 5 in conversation