BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
larsc
Obsidian | Level 7

Hey SAS Community,

 

Asked a question yesterday, but wanted to create a new post which contains more of the specifics I am working with.

 

I have two input tables:

 

The first table (denoted Have1) contains the columns CodeName and CodeKeywords in the following format:

 

CodeName                                        CodeKeywords

AIS  TEST                                          ais-test | pressure-drop

FIELD INSTRUMENT ERROR          psd

GVL TEST                                         GVL drop

LIFT ISSUE                                       lift | slackline

SOLID LOADING                              pbu

PRODUCTION BOTTLENECK         resd 

 

These keywords are entered by engineers in an application and are retrieved by a PROC HTTP call, as such I cannot guarantee the uniqueness of the keywords. As such, I am happy for the first or last match to be the assigned Code in the event of multiple matches. Moreover, the keywords are set up for use with prxmatch such that a keyword like "ais-test | pressure-drop" would match comments containing either "ais-test" OR "pressure-drop".

 

The second table (denoted Have2) contains the columns Date and Comment - where the comment is text written by an engineer and can be missing:

Date                         Daily Comment

27MAR2021             Encountered issues with PSD
27MAR2021              
27MAR2021             Had issues with the lift system, no actions taken
27MAR2021             Experienced pressure-drop
27MAR2021             RESD during morning routine
27MAR2021             No issues encountered

 

So from the above tables, my desired output would be as follows:

 

Date                          Daily Comment                                                             CodeSugg

27MAR2021              Encountered issues with PSD                                        FIELD INSTRUMENT ERROR  
27MAR2021              
27MAR2021              Had issues with the lift system, no actions taken           LIFT ISSUE  
27MAR2021              Experienced pressure-drop                                            AIS  TEST 
27MAR2021              RESD during morning routine                                        PRODUCTION BOTTLENECK
27MAR2021              No issues encountered

 

As illustrated by the above tables, I use the keywords to search the comments for a match - for instance, when finding the word RESD in the "RESD during morning routine" comment the comment should be labelled by the Production Bottleneck code. Until now, I've used prxmatch for this purpose. Additionally, I'd like the resulting Want table to have the same number of rows as the Have2 table (i.e., all comments present in resulting Want table). Note also that if there is no match, no Label should be added in the CodeSugg column for the corresponding comment value.

 

Previously I hardcoded this and it was working satisfactorily:

 

 

 

data want;
	set have;
	%if %symexist(ASV_FAILURE) %then %do;
		if prxmacth("/&ASV_FAILURE./i", loss_comment) then CODE_SUGG = "AVS Failure";
	%end;
run;

 

 

 

I am now trying to create a more "dynamic" code to facilitate for new unknown codes which may be added in the future. For this purpose I've created the following script:

 

 

 

proc sql noprint;
	select count(*)
   into :rowCountHave1
   from Have1;
quit;

%macro suggestCode();
	
	data want;

		%let iter = 1;
		
		set Have2;
		%let comment=DAILY_COMMENT;
	
		%do %while (&iter. <= &rowCountHave1.);
			set Have1(firstobs=&iter. obs=&iter.);

				%let keyword=strip(CodeKeywords);
				%let code=strip(CodeName);
			
				%if prxmatch("/%str(&keyword.)/i", &comment.) %then CODESUGG=&lcode.;;

				%let iter=%eval(&iter.+1);
		%end;
		
	run;

%mend;

%suggestCode();

 

 

 

The rationale behind the above code was that for each comment I would iterate through the table containing the codes and keywords and check for a match. Disclaimer - I have very little experience with looping in SAS and SAS programming in general and this code is currently not working as expected and I encounter the following error without really understanding why:

 

- ERROR: Required operator not found in expression: prxmatch("/&keyword./i.", &comment.) - ERROR: The macro SUGGESTCODE will stop executing

 

I figured the easiest solution for my task was to use a do-while loop for each row in the table (Have1) containing codes and keywords and check for a match for each comment in the Have2 table. Would greatly appreciate any suggestions and comments on how far I am off to get this approach to work.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Something like this should work (untested, please adapt):

data WANT;
  if 0 then set HAVE1 nobs=NOBS;
  set HAVE2;
  do I=1 to NOBS;
    set HAVE1 point=I;
    if prxmatch(catt('/',CODEKEYWORDS,'/i'), trim(COMMENT) ) then CODESUGG=CODENAME;
  end;
run;

Another method could use a hash table, but the syntax above is simpler.

or if the HAVE1 table is large, load it beforehand using SASFILE. 

View solution in original post

10 REPLIES 10
Reeza
Super User
How do you know that first one matches with AIS TEST?

Are you set on macros? I would have assumed a temporary array would be a touch more effective/faster here.
larsc
Obsidian | Level 7

Hi Reeza, sorry that was an "end of the day" error by me - it should always be matching the code containing the keyword. In this case it should match the code which has the keyword "psd".

 

I am not set on using macros - the reason it is set up in a (likely) less-then-effective manner can probably be attributed to my lack of experience with the SAS programming language. This approach was the one that I intuitively thought of and I have not yet worked with arrays in SAS.

ballardw
Super User

How consistent are these for containing the "key words"? I ask because your first example value of " Encountered issues with PSD" contains none of the "keywords" of "ais -test" or "pressure drop". if PSD stands for "pressure drop" then it appears that you should have "PSD" in the keywords for Ais Test. But you have PSD in the keywords for "Field Instrument Error" and that is not the result for the data value of " Encountered issues with PSD".

 

It seems like your rules may not be firm or rigorous enough to be coding yet.

 

Would "slack line" be considered the same as "slackline"?

larsc
Obsidian | Level 7

HI Ballardw,

 

Like mentioned in the post above, the first instance was an error by me. The example has been updated and is now labelled appropriately.

 

I am actually a bit unsure if "slack line" would match "slackline", I'd have to perform a test when getting to work. However, for the purpose of my code you can assume that it should be exact text matching.

ballardw
Super User

Here is an example of one way to use a temporary array to search for specific "keywords" in text.

data textlines;
   infile datalines truncover;
   input text $ 1-30 ;
datalines;
Some string value
Another text line
Junk Junk Junk
;

data arrayexample;
   set textlines;
   array key (5) $ 15 _temporary_ ('TEXT' 'LINE' 'STRING' 'JUNK' 'ABC');
   array r   (5) ;
   do i=1 to dim(key);
      r[i]= index(upcase(text),strip(key[i]))>0;
   end;
   drop i;
run;

In this case I am placing a result of "found" into a 1/0 coded result variable r where 1 will indicate 'word found'. Note that R1=1 means the first word by order in the Key list is found, R2=2 means the second and so one.

I start with this because we have no idea just how complex your comments might be and this will find the cases where multiple keywords are found.

For those meanings with multiple key words a separate pass through the R array variables could be used to set another variable using the Max function with two or more R variables such as:

Text_or_line = max(R[1], R[2]); to create 1/0 flag numeric variable or if you absolutely NEVER have multiple code values you could use a series of IF/Then such as this to set a single value:

 

If max(R[1],R[2]) = 1 then codevalue='Text or line found';

Else if R[3]=1    then codevalue= 'String found';

else if R[4]=1   then codevalue= 'Junk found';

else if R[5]=1  then codevalue= 'ABC found';

 

Which could also be written as a Select

   select ;
      When (R[1],R[2]) codevalue='Text or line found';
      When (R[3])      codevalue= 'String found';
      When (R[4])      codevalue= 'Junk found';
      When (R[5])      codevalue= 'ABC found';
      otherwise;
   end;

With either of these you likely want to set a length for the "Codevalue" variable before assigning a value to control the length of the text.

The _temporary_ in the Key array means that the values are not added to the data set. When creating such an array you need to make sure that specify a number of elements (5) in the example, the length of the longest text value, $ 15 in the example, and the values in a desired order. Note that each individual value is separate, no "list" unless you want to parse each value as well. The comparison I used, INDEX will find "string" if it appears in "stringcombo" so you may want the INDEXW or FINDW function instead. Strip is used on the Key values because some of the comparisons will "pad" the value to the length of the defined value not find "string" because it is looking for "string         ". Strip removes the trailing padded values. I'm sort of old school and when looking for exact matches pretty much always force case to match. The function FINDW has a setting for ignore case for comparisons.

 

 

larsc
Obsidian | Level 7

Thank you for your thorough response @ballardw, I have not worked with arrays in SAS before so this was a very interesting approach.

 

I will try and implement a working example from the code you have created based on my data. 

ChrisNZ
Tourmaline | Level 20

Something like this should work (untested, please adapt):

data WANT;
  if 0 then set HAVE1 nobs=NOBS;
  set HAVE2;
  do I=1 to NOBS;
    set HAVE1 point=I;
    if prxmatch(catt('/',CODEKEYWORDS,'/i'), trim(COMMENT) ) then CODESUGG=CODENAME;
  end;
run;

Another method could use a hash table, but the syntax above is simpler.

or if the HAVE1 table is large, load it beforehand using SASFILE. 

larsc
Obsidian | Level 7

Thanks @ChrisNZ, this worked like a charm! I like the simplicity of the syntax as it will ease the possible debugging other team members will have to perform in the future.

 

Out of curiosity, and with no expactation for you to spend more time on my issue, I tried a different apporach for solving my issue for my data using macro functions:

 


%macro loopOverComments();

	data WANT;
		set HAVE2;
		
		call symputx('comment', COMMENT, 'g');
		call execute('%findCode');

		if symexist('code') then do;
			sugg_loss = symget('code');
			call symdel('code');
		end;

	run;

%mend;

%macro findCode();

	data temp;
		set HAVE1;

		call symputx('keywords', CodeKeywords, 'g');

		if prxmatch("/&keywords/i", "&comment") then do;
			call symputx('code', CodeName, 'g');
			stop;
		end;

	run;

%mend;

%loopOverComments();	

Would you happen to know why this approach is not working? I have tested the macro variables and, for instance, the "comment" macro is available in the %findcode macro function.

ballardw
Super User

Call execute stacks up commands to execute after the data step finishes.

It looks like you are expecting the result of one data step, the "if symexist('code')  to be available in the other and the timing is just not there.

There is another issue about the timing of using a macro variable created in a data step later in the same data step.

 

Run the code with: options mprint; set and you may see more details about how the macro calls are behaving.

ChrisNZ
Tourmaline | Level 20

Would you happen to know why this approach is not working?

As mentioned call execute stacks the code to run after the data step, unless it's pure macro code.

Consider: 

%let a=0;
data _null_;
  RC= dosubl(" data _null_; call symput('a','1'); run; ") ;
  A=symget('a');
  put A=;
run;
data _null_;
  call execute(" %let a=2; ") ;
  A=symget('a');
  put A=;
run;

A=1

A=2

 

 

 

sas-innovate-2024.png

 

Time is running out to save with the early bird rate. Register by Friday, March 1 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1414 views
  • 0 likes
  • 4 in conversation