How to extract first 5 words from string

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

How to extract first 5 words from string

How do I extract the first 5 words of a string? I have two address variables and I want to say "if first 5 words of address1=address2"

 

Thanks in advandced!


Accepted Solutions
Solution
‎06-27-2016 04:22 PM
Respected Advisor
Posts: 3,841

Re: How to extract first 5 words from string

One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.

 

data sample;
  length address1 address2 $60;
  retain _prxid;
  if _n_=1 then
    _prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
  address1="4478 English Elm River Road unit 2";
  address2="4478 English ELm River Road unit 3";
  if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
    match='Y';
  else match='N';
run;

View solution in original post


All Replies
Respected Advisor
Posts: 4,609

Re: How to extract first 5 words from string

What kind of "words" and "strings" are we talking about?

PG
Occasional Contributor
Posts: 19

Re: How to extract first 5 words from string

An address like address1="4478 English Elm River Road unit 2" and address2="4478 English Elm River Road unit 3".

 

I want to check that the first 5 words (i.e."4478 English Elm River Road") is the same in both address1 and address2.

Grand Advisor
Posts: 17,464

Re: How to extract first 5 words from string

Will it always be 5 words or is that variable? Will you have addresses with less than 5 words?

 

Have you looked at compged or soundex functions?

Respected Advisor
Posts: 4,609

Re: How to extract first 5 words from string

[ Edited ]

How about a function that counts the matching length, in words :

 

proc fcmp outlib=sasuser.fcmp.character;
function matchWords(str1 $, str2 $);
j =  min(countw(str1), countw(str2));
do i = j to 1 by -1;
    if upcase(scan(str1, i)) ne upcase(scan(str2, i)) then j = i-1;
    end;
return (j);
endsub;
run;
quit;

options cmplib=sasuser.fcmp;

data have;
address1="4478 English Elm River Road unit 2"; 
address2="4478 English Elm River Road unit 3";
m = matchwords(address1, address2);
put _all_;
run;
PG
Solution
‎06-27-2016 04:22 PM
Respected Advisor
Posts: 3,841

Re: How to extract first 5 words from string

One option of how to address this. The code will return match='Y' if the first 5 words are the same. Not case sensitive and less than 5 words in string allowed.

 

data sample;
  length address1 address2 $60;
  retain _prxid;
  if _n_=1 then
    _prxid=prxparse('s/^\s*(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s+(\w*)\s*.*/\U\1\2\3\4\5/o');
  address1="4478 English Elm River Road unit 2";
  address2="4478 English ELm River Road unit 3";
  if prxchange(_prxid,1,address1)=prxchange(_prxid,1,address2) then
    match='Y';
  else match='N';
run;
Occasional Contributor
Posts: 19

Re: How to extract first 5 words from string

Life saver!!!! Thank you so much, works perfect! 

 

I have another similar question you might know the solution to. How can I write code to check that address1 equals (is identical) to (address2 minus the last 2 words).

 

Like 

address1=123 main st

address2= 123 main st bldg A

Grand Advisor
Posts: 17,464

Re: How to extract first 5 words from string

You may have to register and log in to see this, but someone posted a macro to implement USPS abbreviations if that's helpful at all.

 

https://listserv.uga.edu/cgi-bin/wa?A2=ind1606c&L=SAS-L&X=A8B00C8C9C39792B2F&Y=fkhurshed%40gmail.com...

 

Grand Advisor
Posts: 9,596

Re: How to extract first 5 words from string

data _null_;
address1="4478 English Elm River Road unit 2"; 
address2="4478 English Elm River Road unit 3";
call scan(address1,6,p1,l1,' ');
call scan(address2,6,p2,l2,' ');
five1=substr(address1,1,p1-2);
five2=substr(address2,1,p2-2);
if five1=five2 then put 'YES';
 else put 'NO';
run;

Respected Advisor
Posts: 3,841

Re: How to extract first 5 words from string

And what happens if the address string is shorter?

  address1="4478 English Elm Road";
  address2="9999 English Elm Road";
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 539 views
  • 3 likes
  • 5 in conversation