BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

Hi there,

Following guidance from the sas community, I have been trying to flag records based on word distance criteria. Probably I am doing some mistake in syntax and not achieving success. I have attached a sample of 4 records to test my code given below:

Thank you in advance for your kind reply. 

data test2; 
set test1;
   First = 'negative' ;
   Second ='malignancy' ;
   array firsts (4)  f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
   Array seconds (4) s1-s4; 
   Findex=1;/* these index variables will point to where to store the word count in the arrays*/
   Sindex=1;
   do i = 1 to (countw(xyz));
      if upcase(First) = upcase(Scan(xyz,i)) then do;
         Firsts[Findex] = i;
         Findex = Findex+1;
      end;
      if upcase(Second) = upcase(Scan(xyz,i)) then do;
         Seconds[Sindex]=i;
         Sindex = Sindex +1;
      end;
   end;

   do i = 1 to (n(of Firsts(*)));
      do j = 1 to (n(of seconds(*)));
         if 0< seconds[j]- firsts[i] le 6 then 
            put First "occurs in position" +1 firsts[i] "and"+1 Second "occurs at position" +1 Seconds[j];
			negative_malignancy=1;
      end;
   end;

run;
Swain
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

At the risk of sounding insistent, I would favor treating this problem with regular expressions (I renamed your file Diagnosis.xls😞

 

libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;

data test2;
First = 'negative' ;
Second ='malignancy' ;
prx = prxParse(cats("/(", First, ")(?:\W+\w+){0,6}\W+(", Second, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
    call prxPosn(prx, 1, firstPos);
    call prxPosn(prx, 2, secondPos);
    end;
drop prx;
run;

proc sql;
select * from test2;
quit;
PG

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

At the risk of sounding insistent, I would favor treating this problem with regular expressions (I renamed your file Diagnosis.xls😞

 

libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;

data test2;
First = 'negative' ;
Second ='malignancy' ;
prx = prxParse(cats("/(", First, ")(?:\W+\w+){0,6}\W+(", Second, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
    call prxPosn(prx, 1, firstPos);
    call prxPosn(prx, 2, secondPos);
    end;
drop prx;
run;

proc sql;
select * from test2;
quit;
PG
DeepakSwain
Pyrite | Level 9

Hi there,

Perl regular expression seems to be awesome. Can you modify it to put third word in the criteria for example;

first word= 'negative'

second word='for' 

third word ='malignancy'

the distance between first and second word = 1

the distance between second and third word le 6.

 

Thank you in advance for your kind guidance. Good learning experience.

 

Swain
PGStats
Opal | Level 21

Extended version:

 

libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;

data test3;
First =  'negative' ;
Second = 'for';
Third  = 'malignancy' ;
prx = prxParse(cats("/(", First, ")\W+", Second, "(?:\W+\w+){0,5}\W+(", Third, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
    call prxPosn(prx, 1, firstPos);
    call prxPosn(prx, 2, lastPos);
    end;
drop prx;
run;

proc sql;
select * from test3;
quit;
PG
DeepakSwain
Pyrite | Level 9

Hi PG,

 

For the same situation, the code provided by you is working fanstastic using cats function. Just to have a better understing of the regular expression syntax, I am trying to write the same code without using cats function.  

libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;

data test3;
First =  'negative' ;
Second = 'for';
Third  = 'malignancy' ;
prx = prxParse(cats("/(", First, ")\W+", Second, "(?:\W+\w+){0,5}\W+(", Third, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
    call prxPosn(prx, 1, firstPos);
    call prxPosn(prx, 2, lastPos);
    end;
drop prx;
run;

proc sql;
select * from test3;
quit;

 I am trying to replace the 3 red lines code with a single line code:

prx =

prxParse("/("negative")\W+", for, "(?:\W+\w+){0,5}\W+(" malignancy")\b/i"));

 Due to limited experience, I am doing some syntax error. Can you please help me. 

 

Regards,

Deepak

Swain
PGStats
Opal | Level 21

prx = prxParse("/(negative)\W+for(?:\W+\w+){0,5}\W+(malignancy)\b/io");

PG
DeepakSwain
Pyrite | Level 9

Hi there,

The solution provided is an excellent example of PERL. Being new to SAS, I having limited experience with this language. Unfortunately I have no idea about use of prxparse and cats function together. Is it possible to get some article or information materials showing examples of use of prxparse and cats  function together. I am interested to have better understanding about the syntax so that I can use it more frequently in the future.

 

Thank you in advance for your kind reply. 

 

Deepak

Swain
PGStats
Opal | Level 21

cats is a simple string concatenation function that is used to build the search pattern. Regular expression matching functions are more difficult to master but quite powerful. There are many tutorials available on the net. The following example is a bit easier to understand than the previous:

 

data test;
xyz='She was prescribed exercise and diet. You may visit next week to take 
 further advice about medicine as well as as well diet. You must take diet 
 according to your dietician. Later we will think to revise your medicine';
run;

data _null_;
length pattern $50;
set test;
put xyz=/;
First = 'medicine' ;
do Second = "well", "diet", "you", "must", "take" ;
    pattern = cats("/", First, "(\W+\w+){0,6}\W+", Second, "\b/i");
    interest = prxmatch(pattern, xyz);
    put (pattern interest) (=/)/;
    end;
run;

Good luck!

PG
DeepakSwain
Pyrite | Level 9

Hi PG,

 

Thanks for showing your patience with my silly questions as well as for your kind reply. In last 24 hours I am having a nice learning experience in perl. Now your code is crystal clear to me and I can use similar logic in varoius situations. 

 

Regards,

Deepak

 

Swain
FreelanceReinh
Jade | Level 19

@DeepakSwain: Also, what kind of failure did you encounter? I've run your code and the result seems to be correct and the log is clean. Of course, I had to name the character variable XYZ, not DIAGNOSIS, to make the code work.

DeepakSwain
Pyrite | Level 9

Thank you for your feedback. Appreciated. 

Swain

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3853 views
  • 2 likes
  • 3 in conversation