- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I would like to create an indicator variable for successfully matching all the sub-string. I can do the following, for example:
if count(source_var, "substring1") >0 and count(source_var, "substring2")>0 and count(source_var, "substring3")>0 then indicator=1;
else indicator=0;
But I believe there is a much quicker way to do it using Prxmatch syntax....please advise if you have a sample Prxmatch code to achieving this task in SAS. Thank you very much in advance!
- Tags:
- prxmatch
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Medication names are not what I would think of are the best candidates for PRX as the names are not consistent.
indicator = (indexw(source_var,"ABACAVIR",'/') >0 and indexw(source_var,"EMTRICITABINE",'/') >0 );
should work for two to find both and extend with more if need more in an "and" useage, or nesting of () with or conditions.
How many is a 'Large number of meds' and are you discussing mutliple regimens in combination?
Here may be one way to work with varying number of concerned lists.
data example; informat source_var $200.; input source_var; /* the $25 below should be set large enough to hold the longest names expected*/ array reg1 {2} $25 _temporary_ ("ABACAVIR", "EMTRICITABINE"); array reg2 {3} $25 _temporary_ ("ABACAVIR","CIPROFLOXACIN", "DOXYCYCLINE"); /* strip is needed as the temporary array elements get padded to the length of the set length*/ /* use indexw to reduce chance of compounds considered matches*/ indreg1 = ( indexw(source_var,strip(reg1[1]),'/') >0 and indexw(source_var,strip(reg1[2]),'/') >0 ); /* loop to go over longer lists*/ counter=0; do i = 1 to dim(reg2); counter = sum(counter,( indexw(source_var,strip(reg2[i]),'/') >0)); end; indreg2= (counter=3); /* 3 indicates all 3 are found*/ drop i counter; label indreg1='Indicator for regime 1' indreg2='Indicator for regime 2' ; datalines; ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR ABACAVIR/COBICISTAT/ELVITEGRAVIR/CIPROFLOXACIN/TENOFOVIR ABACAVIR/DOXYCYCLINE/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR ABACAVIR/DOXYCYCLINE/CIPROFLOXACIN/EMTRICITABINE/TENOFOVIR ; run;
Note that INDEXW is case sensitive so you may want to upcase your source_var.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Try below
if prxmatch ("m/substring1|substring2|substring3/i",source_var) > 0 then indicator=1 else indicator =0 ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You might post some example data and the searched substrings. If the searched substrings follow patterns then the PRX approach would likely work but if the targets are things like "This String", "bdQ rx - 12345" and "(**)brg" then PRX arent' going to help.
If you don't have more that a few of these targets you may want to look at:
indicator= (sum( count(source_var, "substring1") >0 , count(source_var, "substring2")>0, count(source_var, "substring3")>0) > 0);
or the index function as count may be overkill if presence is the only requirement.
Also is case insensitivity involved?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Basically, I have this text string variable for medication regimen (for example, "ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR"). Each medication is separated by "/". I want to create an indicator variable to capture the medication regimen that includes "ABACAVIR" and "EMTRICITABINE" ( the order does not matter). That's why I was using count() function but I thought there is probably a much quicker way to do it without having to do multiple count() or index() functions in a statement, especially to save time when I need to search for a large numbers of meds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Medication names are not what I would think of are the best candidates for PRX as the names are not consistent.
indicator = (indexw(source_var,"ABACAVIR",'/') >0 and indexw(source_var,"EMTRICITABINE",'/') >0 );
should work for two to find both and extend with more if need more in an "and" useage, or nesting of () with or conditions.
How many is a 'Large number of meds' and are you discussing mutliple regimens in combination?
Here may be one way to work with varying number of concerned lists.
data example; informat source_var $200.; input source_var; /* the $25 below should be set large enough to hold the longest names expected*/ array reg1 {2} $25 _temporary_ ("ABACAVIR", "EMTRICITABINE"); array reg2 {3} $25 _temporary_ ("ABACAVIR","CIPROFLOXACIN", "DOXYCYCLINE"); /* strip is needed as the temporary array elements get padded to the length of the set length*/ /* use indexw to reduce chance of compounds considered matches*/ indreg1 = ( indexw(source_var,strip(reg1[1]),'/') >0 and indexw(source_var,strip(reg1[2]),'/') >0 ); /* loop to go over longer lists*/ counter=0; do i = 1 to dim(reg2); counter = sum(counter,( indexw(source_var,strip(reg2[i]),'/') >0)); end; indreg2= (counter=3); /* 3 indicates all 3 are found*/ drop i counter; label indreg1='Indicator for regime 1' indreg2='Indicator for regime 2' ; datalines; ABACAVIR/COBICISTAT/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR ABACAVIR/COBICISTAT/ELVITEGRAVIR/CIPROFLOXACIN/TENOFOVIR ABACAVIR/DOXYCYCLINE/ELVITEGRAVIR/EMTRICITABINE/TENOFOVIR ABACAVIR/DOXYCYCLINE/CIPROFLOXACIN/EMTRICITABINE/TENOFOVIR ; run;
Note that INDEXW is case sensitive so you may want to upcase your source_var.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content