Dear all,
I am doing some address data cleaning for my company. I managed to get the job done but I am still not satisfied as I know the script can be even simpler. Below is what I have achieved. I used values in street_name table to search in the address column in dataset (address). If there is a match the script will give me a value 1. BTW, the script is copied from our forum. The script works perfectly fine. But my curiosity pushes to go one step further - adding it in a macro statement. So when I run %addr_clean(macro_nm, 1) it will give me the same result. However, I am stuck at converting array syntax to a macro-understanding format. Please help me out.
Macro is a text processor. To find the strings using macro logic, you need to remove the quotes. Here is some code to get you started.
%MACRO ADDR_CLEAN(ST_MACRO, TYP);
PROC SQL NOPRINT;
SELECT STRIP(ST_NM) INTO: &ST_MACRO. SEPARATED BY ',' FROM STREET_NAME
WHERE TYP =&TYP;
QUIT;
/*I am stuck here and not able to do the rest to fit the code in a macro statement*/;
%put &st;
%let count=&sqlobs;
%put &count;
data street_address;
set address;
%do i=1 %to &count;
%let x=%scan(%bquote(&st),&i,%str(,));
if find(address,"&x",'i') gt 0 then found=1;
%end;
run;
proc print data=street_address;
run;
%MEND;
%ADDR_CLEAN(ST, 1);
Macro is a text processor. To find the strings using macro logic, you need to remove the quotes. Here is some code to get you started.
%MACRO ADDR_CLEAN(ST_MACRO, TYP);
PROC SQL NOPRINT;
SELECT STRIP(ST_NM) INTO: &ST_MACRO. SEPARATED BY ',' FROM STREET_NAME
WHERE TYP =&TYP;
QUIT;
/*I am stuck here and not able to do the rest to fit the code in a macro statement*/;
%put &st;
%let count=&sqlobs;
%put &count;
data street_address;
set address;
%do i=1 %to &count;
%let x=%scan(%bquote(&st),&i,%str(,));
if find(address,"&x",'i') gt 0 then found=1;
%end;
run;
proc print data=street_address;
run;
%MEND;
%ADDR_CLEAN(ST, 1);
Hi, Kathryn,
Thank you so much for your in-need support. I was quite busy with an urgent project and didn't have a chance to confirm your solution. Yeah, your code is a lot simpler than mine and it works perfectly.
Quoting and unquoting sometime are really confusing, thanks for valuable advice. I am truly grateful. Your help definitely saved me a lot of time reasearching. You know this is my first post in SAS community and I feel like I am welcomed by our prompt reply.
I will talk to you again when next question comes. LOL.
Regards
Sunny
Before you try to use macro logic to generate a SAS program make sure you know what SAS program you want to generate.
It looks to me like you are trying to generate one of these three statements:
array nn [6] _temporary_ ('ROAD' 'RD' 'STREET' 'ST' 'PLACE' 'PL');
array nn [1] _temporary_ ('PO_BOX');
array nn [2] _temporary_ ('UNIT' 'SUITE');
The part that varies is the count and the space delimited list of quoted words.
So modify your SQL query so that it generates those two values into macro variables.
proc sql noprint;
select quote(trim(st_nm),"'") into :streetlist separated by ' '
from street_name
where typ = &typ
;
%let nstreet=&sqlobs;
quit;
Now you can use NSTREET and STREETLIST macro variables to generate the array statement.
array nn [&nstreet] _temporary_ (&streetlist);
Things to note.
Also note that you should not use STRIP() when generating quoted values from character variables. The resulting quoted string will not have the leading spaces and so will not accurately reflect the value that was in the original character variable. Instead use TRIM() to remove only the trailing spaces. Not having trailing spaces in the quoted value will not cause an issue in comparison since SAS ignores trailing spaces when comparing strings. Perhaps not actually wanted in this case since your probably actually do want to remove any leading spaces that might have gotten into the TYP variable.
Also note that if you call the macro like you did:
%ADDR_CLEAN(ST, TYP);
with the second argument set to the string TYP then all of the street types will be included.
That is because the resulting where clause of
where typ = typ
will be true for every observation of the STREET_NAME dataset.
Hi, Tom,
Thanks for your reply. It is such an elegant piece of code and so easy to understand. With your help, I have had a better understanding of Array. I am a newbie to Array field. But with your demostration, I feel like I am also able to handle this method.
As I said earlier, this is my first post and thank you so much to help me out. I am so thankful. I wish I could learn and grow in this community with great support from those smart people like you and Kathyn.
Once again, thank your for lending me a helping hend when I needed the most.
Keep in touch.
Regards
Sunny Song
@SunnySong wrote:
How come I can only take one post as a solution? You have truly helped me as well.
That is how the software tool SAS is using to run the forum works.
But it does not matter, the main thing is to make sure and select an answer so future users that are searching for the same type of question will know the thread has an answer and so might be useful for them.
@SunnySong wrote:
How come I can only take one post as a solution? You have truly helped me as well.
I'm going to give Tom's post an extra like to compensate for it not being the accepted solution. :-)
Hi @SunnySong , would prxmatch make the answer easier? The code and results are as follows.
However, the loc1=; statement does not work because it seems that the &strlist macro was not resolved. The loc=; statement worked because I copied the value of the &strlist into the statement manually. My furthur question is: how to make the loc1=prxmatch('/(&strlist)/',address); work, i.e., resolve the &strlist macro properly?
data street_name;
input st_nm $ typ 4.;
datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
data address;
input address $20.;
datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc sql noprint;
select st_nm
into :strlist separated by '|'
from street_name
where typ=1;
quit;
%put &strlist;
/*ROAD|RD|STREET|ST|PLACE|PL*/
data str_address;
set address;
loc=prxmatch('/(ROAD|RD|STREET|ST|PLACE|PL)/',address);
if loc^=0 then found=1;
loc1=prxmatch('/(&strlist)/',address);
run;
proc print data=str_address;run;
In general new questions should be raised on a new thread. You can always include a link back to the old question for context.
Note that the macro processors ignore strings bounded on the outside by single quotes. Use double quotes if you want the macro processor to process the content.
loc1=prxmatch("/(&strlist)/",address);
Hi @Tom thanks for correct the syntax! Change the single quotation mark in prxmatch('//'); was the only thing that I did not try (I even tried something like prxmatch('/("&list")/') because I was not familar with perl (prxmatch) syntax and mistakenly believe that the single quotation mark was a fixed syntax in prxmatch() statement.
So now the most efficient and succinct answer (no array, no macro needed) for the question is as follows, and hi @SunnySong , what do you think😀?
data street_name;
input st_nm $ typ 4.;
datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
data address;
input address $20.;
datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc sql noprint;
select st_nm
into :strlist separated by '|'
from street_name
where typ=1;
quit;
data str_address;
set address;
loc=prxmatch("/(&strlist)/",address);
if loc^=0 then found=1;
run;
proc print data=str_address;run;
Hi, dxiao2017,
Thanks for sharing this easy looking but quite advanced code. I do know prxmatch can simplify these searching multiple key words from a long list of strings. I myself even downloaded a tutorial book to learn. But it is not that easy. I only learned some basic knowledge.
Regarding how to get loc1 values, it is in the sql statement.
proc sql noprint; select st_nm into :strlist separated by '|' from street_name where typ=2; quit;
Then we can wrap your whole code up with a macro statement %macro street_clean (type). Giving the type value 1, 2, 3, will assign different list of streets. Also Loc0 or Loc1 needs to be changed to Loc&type. to ensure the data results won't be overwritten.
I will test it when I go home and update you.
Once again, thanks for your sharing. Very inspiring.
Regards
Sunny Song
Hi @SunnySong , thank you for the feedback! I am glad that I can contribute and help a bit 😀 and best luck in testing your code (the macro you are going to write)!
Hi @SunnySong , I think this is the macro you are going to write. "PO_BOX" in the first dataset became "PO BOX" in the second dataset, so the value of the new column in the final output table is missing (it would be "3" if the "PO_BOX" value was the same in the first and second dataset). If you want to deal with this issue (the value in the second dataset is different from those in the first dataset) you need more coding step. You do not need any parameters for the macro. The macro can be simplified a bit because what you want is just the last output table. When finalize the macro do not forget to add the %local statement.
data street_name;
input st_nm $ typ 4.;
datalines;
ROAD 1
RD 1
STREET 1
ST 1
PLACE 1
PL 1
PO_BOX 3
UNIT 2
SUITE 2
;
run;
proc print data=street_name;run;
data address;
input address $20.;
datalines;
3 ARRON ROAD
15 BONNY RD
7 DAISY STREET
PO BOX 15
UNIT 6
89 ROSE PLACE
9 HOPE ST
;
run;
proc print data=address;run;
%macro addressloc;
proc sql;
select distinct typ
into :typlist
from street_name;
quit;
%put &typlist &sqlobs;
%let n=&sqlobs;
%put &n;
%do i=1 %to &n;
proc sql;
select st_nm
into :strlist separated by '|'
from street_name
where typ=&i;
quit;
%put &strlist;
data str_address;
set address;
set str_address;
loc=prxmatch("/(&strlist)/",address);
if loc^=0 then found=&i;
run;
proc print data=str_address;run;
%end;
%mend addressloc;
%addressloc;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.