BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ynchen08
Fluorite | Level 6

Hi all,

 

I would like to create an indicator variable for successfully matching all the sub-string. I can do the following, for example:

 

if count(source_var, "substring1") >0 and count(source_var, "substring2")>0 and count(source_var, "substring3")>0 then indicator=1;

else indicator=0;

 

But I believe there is a much quicker way to do it using Prxmatch syntax....please advise if you have a sample Prxmatch code to achieving this task in SAS. Thank you very much in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Medication names are not what I would think of are the best candidates for PRX as the names are not consistent.

 

indicator = (indexw(source_var,"ABACAVIR",'/') >0 and indexw(source_var,"EMTRICITABINE",'/') >0 );

should work for two to find both and extend with more if need more in an "and" useage, or nesting of () with or conditions.

 

How many is a 'Large number of meds' and are you discussing mutliple regimens in combination? 

Here may be one way to work with varying number of concerned lists.

data example;
   informat source_var $200.;
   input source_var;
   /* the $25 below should be set large enough to hold the longest names expected*/
   array reg1 {2} $25 _temporary_ ("ABACAVIR", "EMTRICITABINE");
   array reg2 {3} $25 _temporary_ ("ABACAVIR","CIPROFLOXACIN", "DOXYCYCLINE");
   /* strip is needed as the temporary array elements get padded to the length of the set length*/
   /* use indexw to reduce chance of compounds considered matches*/
   indreg1 = ( indexw(source_var,strip(reg1[1]),'/') >0 and indexw(source_var,strip(reg1[2]),'/') >0 );
   /* loop to go over longer lists*/
   counter=0;
   do i = 1 to dim(reg2);
      counter = sum(counter,( indexw(source_var,strip(reg2[i]),'/') >0));
   end;
   indreg2= (counter=3); /* 3 indicates all 3 are found*/
   drop i counter;
   label 
      indreg1='Indicator for regime 1'
      indreg2='Indicator for regime 2'
   ;
        
datalines;
ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR
ABACAVIR/COBICISTAT/ELVITEGRAVIR/CIPROFLOXACIN/TENOFOVIR
ABACAVIR/DOXYCYCLINE/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR
ABACAVIR/DOXYCYCLINE/CIPROFLOXACIN/EMTRICITABINE/TENOFOVIR
;
run;

 

Note that INDEXW is case sensitive so you may want to upcase your source_var.

View solution in original post

6 REPLIES 6
ramya_sahu
Obsidian | Level 7

Hi,

Try below

 

if prxmatch ("m/substring1|substring2|substring3/i",source_var) > 0 then indicator=1 else indicator =0 ;

ynchen08
Fluorite | Level 6
Thanks for your input! I tried your code, but it seems to address the situation where at least one of the substrings is found in source_var, as opposed to that all of the substrings are found in source_var (which is what I want to do). In other words, the code is more similar to count(source_var, "substring1") >0 OR count(source_var, "substring2")>0 OR count(source_var, "substring3")>0
ballardw
Super User

You might post some example data and the searched substrings. If the searched substrings follow patterns then the PRX approach would likely work but if the targets are things like "This String", "bdQ rx - 12345" and "(**)brg" then PRX arent' going to help.

 

If you don't have more that a few of these targets you may want to look at:

indicator= (sum( count(source_var, "substring1") >0 , count(source_var, "substring2")>0, count(source_var, "substring3")>0) > 0);

 

or the index function as count may be overkill if presence is the only requirement.

 

Also is case  insensitivity involved? 

ynchen08
Fluorite | Level 6
Thank you for your feedback!

Basically, I have this text string variable for medication regimen (for example, "ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR"). Each medication is separated by "/". I want to create an indicator variable to capture the medication regimen that includes "ABACAVIR" and "EMTRICITABINE" ( the order does not matter). That's why I was using count() function but I thought there is probably a much quicker way to do it without having to do multiple count() or index() functions in a statement, especially to save time when I need to search for a large numbers of meds.

ballardw
Super User

Medication names are not what I would think of are the best candidates for PRX as the names are not consistent.

 

indicator = (indexw(source_var,"ABACAVIR",'/') >0 and indexw(source_var,"EMTRICITABINE",'/') >0 );

should work for two to find both and extend with more if need more in an "and" useage, or nesting of () with or conditions.

 

How many is a 'Large number of meds' and are you discussing mutliple regimens in combination? 

Here may be one way to work with varying number of concerned lists.

data example;
   informat source_var $200.;
   input source_var;
   /* the $25 below should be set large enough to hold the longest names expected*/
   array reg1 {2} $25 _temporary_ ("ABACAVIR", "EMTRICITABINE");
   array reg2 {3} $25 _temporary_ ("ABACAVIR","CIPROFLOXACIN", "DOXYCYCLINE");
   /* strip is needed as the temporary array elements get padded to the length of the set length*/
   /* use indexw to reduce chance of compounds considered matches*/
   indreg1 = ( indexw(source_var,strip(reg1[1]),'/') >0 and indexw(source_var,strip(reg1[2]),'/') >0 );
   /* loop to go over longer lists*/
   counter=0;
   do i = 1 to dim(reg2);
      counter = sum(counter,( indexw(source_var,strip(reg2[i]),'/') >0));
   end;
   indreg2= (counter=3); /* 3 indicates all 3 are found*/
   drop i counter;
   label 
      indreg1='Indicator for regime 1'
      indreg2='Indicator for regime 2'
   ;
        
datalines;
ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR
ABACAVIR/COBICISTAT/ELVITEGRAVIR/CIPROFLOXACIN/TENOFOVIR
ABACAVIR/DOXYCYCLINE/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR
ABACAVIR/DOXYCYCLINE/CIPROFLOXACIN/EMTRICITABINE/TENOFOVIR
;
run;

 

Note that INDEXW is case sensitive so you may want to upcase your source_var.

ynchen08
Fluorite | Level 6
Thank you very much for the clarification.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1404 views
  • 1 like
  • 3 in conversation