BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

I am intersted to measure distance between 2 specific words in a text string in  term of number of words in between them.

Most of the functions I am aware of are providing me distance in term of number of characters such as:

 

data _null_;

searchhere='residential treatment facility';

fullword=indexw(searchhere,'treatment');

put fullword=;

run;

data _null_;

xyz='She sells seashells? Yes, she does.'; *search for the word she;

whereisShe=findw(xyz,'she');

put whereisShe;

run;

 

 

 

N.B: I am looking for distance i.e. number of words between 'sick' and 'antibiotics' in the string: Very sick people may only take antibiotics.

 

Thank you in advance for your kind reply.
Deepak

Swain
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What if one of the words is repeated? Which count would you want?

What if both words appear in the string multiple times?

What if the "first" word actually occurs after the "second" word?

Is the search to be Case sensitive? Is "Sick" to match "sick" (I assume yes, but should clarify)

What happens when only one of the words matches?

 

You may also have to look at delimeters between works does a dash in a compound word qualify? Would sick-bed count as "sick"?

 

A stub of some code that may work that matches the FIRST occurence of a word and matches regardless of case.

data _null_; 

xyz='She sells seashells? Yes, she does.'; *search for the word she;
First = 'She'  ;
Second = 'does' ;
Firstword=.; 
Secondword=.;
do i = 1 to (countw(xyz));
   if missing(Firstword) and upcase(First) = upcase(Scan(xyz,i)) then FirstWord=i;
   if missing(Secondword) and upcase(Second) = upcase(Scan(xyz,i)) then Secondword=i;
end;

put Firstword= SecondWord=;

run;

View solution in original post

3 REPLIES 3
ballardw
Super User

What if one of the words is repeated? Which count would you want?

What if both words appear in the string multiple times?

What if the "first" word actually occurs after the "second" word?

Is the search to be Case sensitive? Is "Sick" to match "sick" (I assume yes, but should clarify)

What happens when only one of the words matches?

 

You may also have to look at delimeters between works does a dash in a compound word qualify? Would sick-bed count as "sick"?

 

A stub of some code that may work that matches the FIRST occurence of a word and matches regardless of case.

data _null_; 

xyz='She sells seashells? Yes, she does.'; *search for the word she;
First = 'She'  ;
Second = 'does' ;
Firstword=.; 
Secondword=.;
do i = 1 to (countw(xyz));
   if missing(Firstword) and upcase(First) = upcase(Scan(xyz,i)) then FirstWord=i;
   if missing(Secondword) and upcase(Second) = upcase(Scan(xyz,i)) then Secondword=i;
end;

put Firstword= SecondWord=;

run;
DeepakSwain
Pyrite | Level 9
Hi there, First of all, I want to thank you for your kind reply. Using your code I have successfully measured distance between two specific words in number of words. data _null_; xyz='She was prescribed exercise and drug. You may visit next week to take further advice about medicine as well as diet'; *search for the word she; First = 'medicine' ; Second = 'diet' ; Firstword=.; Secondword=.; worddistance=.; do i = 1 to (countw(xyz)); if missing(Firstword) and upcase(First) = upcase(Scan(xyz,i)) then FirstWord=i; if missing(Secondword) and upcase(Second) = upcase(Scan(xyz,i)) then Secondword=i; end; if Firstword lt Secondword; worddistance= SecondWord-Firstword; put worddistance=; put Secondword = Firstword=; run; Once again thank you for raising some questions which are very relevant to my analysis. To initiate discussion of the issue, I tried to keep it as simple as possible. * The words are case insensitive. * If the first word come after the second word, it can be filtered from flagging/analysis by using if Firstword lt Secondword; * if only one of the two words are present, it will be automatically filtered from flagging/analysis and is desired too. * Now the issue remaining to be addressed is the calculation of distance when words are occurring multiple times: for e.g. xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as diet'; The above code is not working. The word 'diet ' is occurring twice. The code measures the distance for the first "diet" and not for the second "diet". Again the condition i.e. second word should always be next to first word to measure the distance also expels it. Once again, thank you in advance for your kind guidance. Regards, Deepak
Swain
ballardw
Super User

You could exend the logic about finding words multiple times but you'll still need to make some assumptions and decisions.

For instance you can find out how many times the specific words occur and then using an array store the positions for first, second, etc occurence for each word.

 

This demonstrates getting those values.

You will need to decide your logic on getting which comparisons of the positions you want.

data _null_; 

   xyz='She sells seashells? Yes, she does.'; *search for the word she;
   First = 'She'  ;
   Second = 'does' ;
   array firsts (4)  f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
   Array seconds (4) s1-s4; 
   Findex=1;/* these index variables will point to where to store the word count in the arrays*/
   Sindex=1;
   do i = 1 to (countw(xyz));
      if upcase(First) = upcase(Scan(xyz,i)) then do;
         Firsts[Findex] = i;
         Findex = Findex+1;
      end;
      if upcase(Second) = upcase(Scan(xyz,i)) then do;
         Seconds[Sindex]=i;
         Sindex = Sindex +1;
      end;
   end;

   do i = 1 to (n(of Firsts(*)));
      put First "occurs in position" +1 Firsts[i] +(-1) '.' @;
      do j = 1 to (n(of seconds(*)));
         put +1 second "occurs in position" +1 seconds[j];
      end;
      put;
   end;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 3336 views
  • 2 likes
  • 2 in conversation