How do I extract the first 5 words of a string? I have two address variables and I want to say "if first 5 words of address1=address2"
Thanks in advandced!
One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.
data sample;
length address1 address2 $60;
retain _prxid;
if _n_=1 then
_prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
address1="4478 English Elm River Road unit 2";
address2="4478 English ELm River Road unit 3";
if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
match='Y';
else match='N';
run;
What kind of "words" and "strings" are we talking about?
An address like address1="4478 English Elm River Road unit 2" and address2="4478 English Elm River Road unit 3".
I want to check that the first 5 words (i.e."4478 English Elm River Road") is the same in both address1 and address2.
Will it always be 5 words or is that variable? Will you have addresses with less than 5 words?
Have you looked at compged or soundex functions?
How about a function that counts the matching length, in words :
proc fcmp outlib=sasuser.fcmp.character;
function matchWords(str1 $, str2 $);
j = min(countw(str1), countw(str2));
do i = j to 1 by -1;
if upcase(scan(str1, i)) ne upcase(scan(str2, i)) then j = i-1;
end;
return (j);
endsub;
run;
quit;
options cmplib=sasuser.fcmp;
data have;
address1="4478 English Elm River Road unit 2";
address2="4478 English Elm River Road unit 3";
m = matchwords(address1, address2);
put _all_;
run;
One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.
data sample;
length address1 address2 $60;
retain _prxid;
if _n_=1 then
_prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
address1="4478 English Elm River Road unit 2";
address2="4478 English ELm River Road unit 3";
if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
match='Y';
else match='N';
run;
Life saver!!!! Thank you so much, works perfect!
Patrick, I have another similar question you might know the solution to. How can I write code to check that address1 equals (is identical) to (address2 minus the last 2 words).
Like
address1=123 main st
address2= 123 main st bldg A
You may have to register and log in to see this, but someone posted a macro to implement USPS abbreviations if that's helpful at all.
data _null_; address1="4478 English Elm River Road unit 2"; address2="4478 English Elm River Road unit 3"; call scan(address1,6,p1,l1,' '); call scan(address2,6,p2,l2,' '); five1=substr(address1,1,p1-2); five2=substr(address2,1,p2-2); if five1=five2 then put 'YES'; else put 'NO'; run;
And what happens if the address string is shorter?
address1="4478 English Elm Road";
address2="9999 English Elm Road";
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.