How do I extract the first 5 words of a string? I have two address variables and I want to say "if first 5 words of address1=address2"
Thanks in advandced!
One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.
data sample;
length address1 address2 $60;
retain _prxid;
if _n_=1 then
_prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
address1="4478 English Elm River Road unit 2";
address2="4478 English ELm River Road unit 3";
if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
match='Y';
else match='N';
run;
What kind of "words" and "strings" are we talking about?
An address like address1="4478 English Elm River Road unit 2" and address2="4478 English Elm River Road unit 3".
I want to check that the first 5 words (i.e."4478 English Elm River Road") is the same in both address1 and address2.
Will it always be 5 words or is that variable? Will you have addresses with less than 5 words?
Have you looked at compged or soundex functions?
How about a function that counts the matching length, in words :
proc fcmp outlib=sasuser.fcmp.character;
function matchWords(str1 $, str2 $);
j = min(countw(str1), countw(str2));
do i = j to 1 by -1;
if upcase(scan(str1, i)) ne upcase(scan(str2, i)) then j = i-1;
end;
return (j);
endsub;
run;
quit;
options cmplib=sasuser.fcmp;
data have;
address1="4478 English Elm River Road unit 2";
address2="4478 English Elm River Road unit 3";
m = matchwords(address1, address2);
put _all_;
run;
One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.
data sample;
length address1 address2 $60;
retain _prxid;
if _n_=1 then
_prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
address1="4478 English Elm River Road unit 2";
address2="4478 English ELm River Road unit 3";
if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
match='Y';
else match='N';
run;
Life saver!!!! Thank you so much, works perfect!
Patrick, I have another similar question you might know the solution to. How can I write code to check that address1 equals (is identical) to (address2 minus the last 2 words).
Like
address1=123 main st
address2= 123 main st bldg A
You may have to register and log in to see this, but someone posted a macro to implement USPS abbreviations if that's helpful at all.
data _null_; address1="4478 English Elm River Road unit 2"; address2="4478 English Elm River Road unit 3"; call scan(address1,6,p1,l1,' '); call scan(address2,6,p2,l2,' '); five1=substr(address1,1,p1-2); five2=substr(address2,1,p2-2); if five1=five2 then put 'YES'; else put 'NO'; run;
And what happens if the address string is shorter?
address1="4478 English Elm Road";
address2="9999 English Elm Road";
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.