Hi there,
Following guidance from the sas community, I have been trying to flag records based on word distance criteria. Probably I am doing some mistake in syntax and not achieving success. I have attached a sample of 4 records to test my code given below:
Thank you in advance for your kind reply.
data test2;
set test1;
First = 'negative' ;
Second ='malignancy' ;
array firsts (4) f1-f4; /*assumes 1) that the first word won't occur more than 4 times*/
Array seconds (4) s1-s4;
Findex=1;/* these index variables will point to where to store the word count in the arrays*/
Sindex=1;
do i = 1 to (countw(xyz));
if upcase(First) = upcase(Scan(xyz,i)) then do;
Firsts[Findex] = i;
Findex = Findex+1;
end;
if upcase(Second) = upcase(Scan(xyz,i)) then do;
Seconds[Sindex]=i;
Sindex = Sindex +1;
end;
end;
do i = 1 to (n(of Firsts(*)));
do j = 1 to (n(of seconds(*)));
if 0< seconds[j]- firsts[i] le 6 then
put First "occurs in position" +1 firsts[i] "and"+1 Second "occurs at position" +1 Seconds[j];
negative_malignancy=1;
end;
end;
run;
At the risk of sounding insistent, I would favor treating this problem with regular expressions (I renamed your file Diagnosis.xls😞
libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;
data test2;
First = 'negative' ;
Second ='malignancy' ;
prx = prxParse(cats("/(", First, ")(?:\W+\w+){0,6}\W+(", Second, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
call prxPosn(prx, 1, firstPos);
call prxPosn(prx, 2, secondPos);
end;
drop prx;
run;
proc sql;
select * from test2;
quit;
At the risk of sounding insistent, I would favor treating this problem with regular expressions (I renamed your file Diagnosis.xls😞
libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;
data test2;
First = 'negative' ;
Second ='malignancy' ;
prx = prxParse(cats("/(", First, ")(?:\W+\w+){0,6}\W+(", Second, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
call prxPosn(prx, 1, firstPos);
call prxPosn(prx, 2, secondPos);
end;
drop prx;
run;
proc sql;
select * from test2;
quit;
Hi there,
Perl regular expression seems to be awesome. Can you modify it to put third word in the criteria for example;
first word= 'negative'
second word='for'
third word ='malignancy'
the distance between first and second word = 1
the distance between second and third word le 6.
Thank you in advance for your kind guidance. Good learning experience.
Extended version:
libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;
data test3;
First = 'negative' ;
Second = 'for';
Third = 'malignancy' ;
prx = prxParse(cats("/(", First, ")\W+", Second, "(?:\W+\w+){0,5}\W+(", Third, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
call prxPosn(prx, 1, firstPos);
call prxPosn(prx, 2, lastPos);
end;
drop prx;
run;
proc sql;
select * from test3;
quit;
Hi PG,
For the same situation, the code provided by you is working fanstastic using cats function. Just to have a better understing of the regular expression syntax, I am trying to write the same code without using cats function.
libname xl Excel "&sasforum\datasets\diagnosis.xls" access=readonly;
data test3;
First = 'negative' ;
Second = 'for';
Third = 'malignancy' ;
prx = prxParse(cats("/(", First, ")\W+", Second, "(?:\W+\w+){0,5}\W+(", Third, ")\b/i"));
set xl.'test1$'n;
if prxMatch(prx, diagnosis) then do;
call prxPosn(prx, 1, firstPos);
call prxPosn(prx, 2, lastPos);
end;
drop prx;
run;
proc sql;
select * from test3;
quit;
I am trying to replace the 3 red lines code with a single line code:
prx =
prxParse("/("negative")\W+", for, "(?:\W+\w+){0,5}\W+(" malignancy")\b/i"));
Due to limited experience, I am doing some syntax error. Can you please help me.
Regards,
Deepak
prx = prxParse("/(negative)\W+for(?:\W+\w+){0,5}\W+(malignancy)\b/io");
Hi there,
The solution provided is an excellent example of PERL. Being new to SAS, I having limited experience with this language. Unfortunately I have no idea about use of prxparse and cats function together. Is it possible to get some article or information materials showing examples of use of prxparse and cats function together. I am interested to have better understanding about the syntax so that I can use it more frequently in the future.
Thank you in advance for your kind reply.
Deepak
cats is a simple string concatenation function that is used to build the search pattern. Regular expression matching functions are more difficult to master but quite powerful. There are many tutorials available on the net. The following example is a bit easier to understand than the previous:
data test;
xyz='She was prescribed exercise and diet. You may visit next week to take
further advice about medicine as well as as well diet. You must take diet
according to your dietician. Later we will think to revise your medicine';
run;
data _null_;
length pattern $50;
set test;
put xyz=/;
First = 'medicine' ;
do Second = "well", "diet", "you", "must", "take" ;
pattern = cats("/", First, "(\W+\w+){0,6}\W+", Second, "\b/i");
interest = prxmatch(pattern, xyz);
put (pattern interest) (=/)/;
end;
run;
Good luck!
Hi PG,
Thanks for showing your patience with my silly questions as well as for your kind reply. In last 24 hours I am having a nice learning experience in perl. Now your code is crystal clear to me and I can use similar logic in varoius situations.
Regards,
Deepak
@DeepakSwain: Also, what kind of failure did you encounter? I've run your code and the result seems to be correct and the log is clean. Of course, I had to name the character variable XYZ, not DIAGNOSIS, to make the code work.
Thank you for your feedback. Appreciated.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.