I am intersted to measure distance between 2 specific words in a text string in term of number of words in between them.
Most of the functions I am aware of are providing me distance in term of number of characters such as:
data _null_;
searchhere='residential treatment facility';
fullword=indexw(searchhere,'treatment');
put fullword=;
run;
data _null_;
xyz='She sells seashells? Yes, she does.'; *search for the word she;
whereisShe=findw(xyz,'she');
put whereisShe;
run;
N.B: I am looking for distance i.e. number of words between 'sick' and 'antibiotics' in the string: Very sick people may only take antibiotics.
Thank you in advance for your kind reply.
Deepak
What if one of the words is repeated? Which count would you want?
What if both words appear in the string multiple times?
What if the "first" word actually occurs after the "second" word?
Is the search to be Case sensitive? Is "Sick" to match "sick" (I assume yes, but should clarify)
What happens when only one of the words matches?
You may also have to look at delimeters between works does a dash in a compound word qualify? Would sick-bed count as "sick"?
A stub of some code that may work that matches the FIRST occurence of a word and matches regardless of case.
data _null_;
xyz='She sells seashells? Yes, she does.'; *search for the word she;
First = 'She' ;
Second = 'does' ;
Firstword=.;
Secondword=.;
do i = 1 to (countw(xyz));
if missing(Firstword) and upcase(First) = upcase(Scan(xyz,i)) then FirstWord=i;
if missing(Secondword) and upcase(Second) = upcase(Scan(xyz,i)) then Secondword=i;
end;
put Firstword= SecondWord=;
run;
What if one of the words is repeated? Which count would you want?
What if both words appear in the string multiple times?
What if the "first" word actually occurs after the "second" word?
Is the search to be Case sensitive? Is "Sick" to match "sick" (I assume yes, but should clarify)
What happens when only one of the words matches?
You may also have to look at delimeters between works does a dash in a compound word qualify? Would sick-bed count as "sick"?
A stub of some code that may work that matches the FIRST occurence of a word and matches regardless of case.
data _null_;
xyz='She sells seashells? Yes, she does.'; *search for the word she;
First = 'She' ;
Second = 'does' ;
Firstword=.;
Secondword=.;
do i = 1 to (countw(xyz));
if missing(Firstword) and upcase(First) = upcase(Scan(xyz,i)) then FirstWord=i;
if missing(Secondword) and upcase(Second) = upcase(Scan(xyz,i)) then Secondword=i;
end;
put Firstword= SecondWord=;
run;
You could exend the logic about finding words multiple times but you'll still need to make some assumptions and decisions.
For instance you can find out how many times the specific words occur and then using an array store the positions for first, second, etc occurence for each word.
This demonstrates getting those values.
You will need to decide your logic on getting which comparisons of the positions you want.
data _null_;
xyz='She sells seashells? Yes, she does.'; *search for the word she;
First = 'She' ;
Second = 'does' ;
array firsts (4) f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
Array seconds (4) s1-s4;
Findex=1;/* these index variables will point to where to store the word count in the arrays*/
Sindex=1;
do i = 1 to (countw(xyz));
if upcase(First) = upcase(Scan(xyz,i)) then do;
Firsts[Findex] = i;
Findex = Findex+1;
end;
if upcase(Second) = upcase(Scan(xyz,i)) then do;
Seconds[Sindex]=i;
Sindex = Sindex +1;
end;
end;
do i = 1 to (n(of Firsts(*)));
put First "occurs in position" +1 Firsts[i] +(-1) '.' @;
do j = 1 to (n(of seconds(*)));
put +1 second "occurs in position" +1 seconds[j];
end;
put;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.