BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

Hi there,

Based on help from sas community, I have tried to identify records having specific words of interest in a particular order (e.g medicine will be first and diet will be second) as well as the distance between these words should be less than 7 words. 

data test;

xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; *search for the word she;
First = 'medicine' ;
Second = 'diet' ;
array firsts (3) f1-f3;
Array seconds (3) s1-s3;
Findex=1;
Sindex=1;
do i = 1 to (countw(xyz));
if upcase(First) = upcase(Scan(xyz,i)) then do;
Firsts[Findex] = i;
Findex = Findex+1;
end;
if upcase(Second) = upcase(Scan(xyz,i)) then do;
Seconds[Sindex]=i;
Sindex = Sindex +1;
end;
end;

do i = 1 to (n(of Firsts(*)));
put First "occurs in position" +1 Firsts[i] ;
end;
do j = 1 to (n(of seconds(*)));
put second "occurs in position" +1 seconds[j] ;
end;


if (Firsts(1) lt seconds(1) and seconds(1) - firsts(1) le 6 and Firsts(1) ne . and seconds(1) ne . )
or (Firsts(1) lt seconds(2) and seconds(2) - firsts(1) le 6 and Firsts(1) ne . and seconds(2) ne . )
or (Firsts(1) lt seconds(3) and seconds(3) - firsts(1) le 6 and Firsts(1) ne . and seconds(3) ne . )
or (Firsts(2) lt seconds(1) and seconds(1) - firsts(2) le 6 and Firsts(2) ne . and seconds(1) ne . )
or (Firsts(2) lt seconds(2) and seconds(2) - firsts(2) le 6 and Firsts(2) ne . and seconds(2) ne . )
or (Firsts(2) lt seconds(3) and seconds(3) - firsts(2) le 6 and Firsts(2) ne . and seconds(3) ne . )
or (Firsts(3) lt seconds(1) and seconds(1) - firsts(3) le 6 and Firsts(3) ne . and seconds(1) ne . )
or (Firsts(3) lt seconds(2) and seconds(2) - firsts(3) le 6 and Firsts(3) ne . and seconds(2) ne . )
or (Firsts(3) lt seconds(3) and seconds(3) - firsts(3) le 6 and Firsts(3) ne . and seconds(3) ne . ) ;

run;

 

 

Can somebody suggest me some simplied code for the colored section. 

 

Thank you in advance for your kind reply. 

Regards,

Deepak

Swain
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Look at this example:

data _null_; 

xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; 
   First = 'medicine' ;
   Second ='diet' ;
   array firsts (4)  f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
   Array seconds (4) s1-s4; 
   Findex=1;/* these index variables will point to where to store the word count in the arrays*/
   Sindex=1;
   do i = 1 to (countw(xyz));
      if upcase(First) = upcase(Scan(xyz,i)) then do;
         Firsts[Findex] = i;
         Findex = Findex+1;
      end;
      if upcase(Second) = upcase(Scan(xyz,i)) then do;
         Seconds[Sindex]=i;
         Sindex = Sindex +1;
      end;
   end;

   do i = 1 to (n(of Firsts(*)));
      do j = 1 to (n(of seconds(*)));
         if 0< seconds[j]- firsts[i] le 6 then 
            put First "occurs in position" +1 firsts[i] "and"+1 Second "occurs at position" +1 Seconds[j];
      end;
   end;

run;

Also, it may be time to read a bit about arrays and logic constructs. To out put just the ones where "diet" is 6 or fewer words after "medicine" compare each value pair.

 

 

Please post code in the box after selecting the "run" icon above. It will preserve formatting and the indents really make it much easier to read nested do loop code frequently needed for arrays.

 

You could look up the n (of seconds(*)) to find out that it gets the count of populated cells in the array so all of your comments about "and ne ." are not needed.

You DID need to add the Position of second - Position of first should be greater than 0.

View solution in original post

6 REPLIES 6
ballardw
Super User

Look at this example:

data _null_; 

xyz='She was prescribed exercise and diet. You may visit next week to take further advice about medicine as well as as well diet. You must take diet according to your dietician. Later we will think to revise your medicine'; 
   First = 'medicine' ;
   Second ='diet' ;
   array firsts (4)  f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
   Array seconds (4) s1-s4; 
   Findex=1;/* these index variables will point to where to store the word count in the arrays*/
   Sindex=1;
   do i = 1 to (countw(xyz));
      if upcase(First) = upcase(Scan(xyz,i)) then do;
         Firsts[Findex] = i;
         Findex = Findex+1;
      end;
      if upcase(Second) = upcase(Scan(xyz,i)) then do;
         Seconds[Sindex]=i;
         Sindex = Sindex +1;
      end;
   end;

   do i = 1 to (n(of Firsts(*)));
      do j = 1 to (n(of seconds(*)));
         if 0< seconds[j]- firsts[i] le 6 then 
            put First "occurs in position" +1 firsts[i] "and"+1 Second "occurs at position" +1 Seconds[j];
      end;
   end;

run;

Also, it may be time to read a bit about arrays and logic constructs. To out put just the ones where "diet" is 6 or fewer words after "medicine" compare each value pair.

 

 

Please post code in the box after selecting the "run" icon above. It will preserve formatting and the indents really make it much easier to read nested do loop code frequently needed for arrays.

 

You could look up the n (of seconds(*)) to find out that it gets the count of populated cells in the array so all of your comments about "and ne ." are not needed.

You DID need to add the Position of second - Position of first should be greater than 0.

DeepakSwain
Pyrite | Level 9
Hi there,
I am familiar with First. and Last. but the concept of firsts() and seconds() are new to me. In other word, I am new to sas. Can you kindly provide me some informative materials related to it to enrich my knowledge. Thank you in advance for your kind reply. Have a nice weekend.
Regards,
Deepak
Swain
PGStats
Opal | Level 21

Pattern matching is ideal for this kind of intricate request:

 

data test;
xyz='She was prescribed exercise and diet. You may visit next week to take 
further advice about medicine as well as as well diet. You must take diet 
according to your dietician. Later we will think to revise your medicine';
run;

data _null_;
set test;
First = 'medicine' ;
do Second = "well", "diet", "you", "must", "take" ;
    interest = prxmatch(cats("/", First, "(\W+\w+){0,6}\W+", Second, "\b/i"), xyz);
    put (First Second interest) (=)/;
    end;
run;

/* 
 Pattern reads : Find First word, followed with 0 to 6 words (a sequence 
 of non-word characters (\W) followed by a sequence of word characters (\w)), 
 followed with a sequence of non-word characters, followed with the 
 Second word, ending on a word boundary (\b). 
 The match is case insensitive (i). 
*/

Edit: new version + comments 

PG
DeepakSwain
Pyrite | Level 9
Hi there,
I am very much interested to explore the use of perl in this context. Unfortunately I am having little understanding of it being novice to sas. I tried to ran the gievn code but could not understand the importance of "think". I am looking for records where there will first word medicine being followed by second word diet and there may be intervening words less than 7.
Thank you in advance for your valuable input.
Regards,
Deepak
Swain
PGStats
Opal | Level 21

"think" is a word that is present in the string but at more than 7 words away from "medicine", thus the result interest=0.

 

The pattern means : Find the word medicine followed by zero to six (sequences of word letters followed one or many spaces) folowed by spaces and the word diet.

PG
PGStats
Opal | Level 21

See edited version of my code.

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 942 views
  • 1 like
  • 3 in conversation