BookmarkSubscribeRSS Feed
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

I am using a scan function to count the number of drugs in a regimen; however, I need to differentiate between ' - ' and '-'. In other words, if there is not a space before and after a hyphen then I want to count that as one drug. I am also listing some drugs that do not get counted.  For example, I want  initialChemoCount =0 for sipuleucel-t and ziv-afilbercept and initialChemoCount = 2 for 'fluorouracil - oxaliplatin'. I also want to make sure drug names with spaces is counted as 1 drug (e.g. string1 = 'paclitaxel albumin bound' ).

 

The code below is close but not working because the scan reads 'sipuleucel-t' as two separate drugs and 'fluorouracil - irinotecan - ziv-aflibercept' as 3 when it should be 2 based on exclusions. Any suggestions?

 

data test  ;
*string1 = 'sipuleucel-t' ;  /*initialchemocount = 0 because I want to exclude them in the count*/

*string1 = 'ziv-afilbercept' ; /*initialchemocount = 0 because I want to exclude them in the count*/

*string1 = 'paclitaxel albumin bound' ; /*initialchemocount = 1*/
*string1 = 'fluorouracil - oxaliplatin' ;  /*initialchemocount = 2*/

string1 = 'fluorouracil - irinotecan - ziv-aflibercept'  /*initialchemocount = 2 because I want to exclude ziv-aflibercept*/

 

/*Count the number of drugs in original drug combo*/
ComboCount = count(string1," - " ) + 1;

/*Count the number of chemo drugs in original drug combo*/
array initchemo{20} $200 ;

initialChemoCount = comboCount ;
do j = 1 to combocount until (p <= 0) ;
call scan(string1,j,p,l," - ") ;
if p=1 then initchemo[j] = substrn(string1, p, l-1); /*use p and l to account for legitimate space e.g. paclitaxel albumin bound */
else initchemo[j] = substrn(string1,p+1,l-1) ;

 

if index(initchemo[j],'investigational' ) then initialchemocount=. ;
else if initchemo[j] IN ('sipuleucel-t', 'ziv-afilbercept', 'abiraterone', 'enzalutamide', 'interferon alfa-2b', 'radium Ra 223 dichloride' )
then initialchemocount=0 ;
else if prxmatch("m/sipuleucel-t|ziv-afilbercept|ado-trastuzumab|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride/oi",initchemo[j]) > 0
then initialChemoCount = initialchemocount - 1 ;
end ;

run;

PROC PRINT DATA=test ;
VAR string1 initialchemocount combocount chemodrugcount initchemo:;
run;

7 REPLIES 7
art297
Opal | Level 21

If I correctly understand what you're trying to do, then all you need is:

data have;
length string1 $255; string1 = 'sipuleucel-t' ; output; string1 = 'ziv-afilbercept' ;output; string1 = 'paclitaxel albumin bound';output; string1 = 'fluorouracil - oxaliplatin' ;output; string1 = 'fluorouracil - irinotecan - ziv-aflibercept';output; run; data test; set have; string1=TRANWRD(string1,'- sipuleucel-t',''); string1=TRANWRD(string1,'sipuleucel-t -',''); string1=TRANWRD(string1,'sipuleucel-t',''); string1=TRANWRD(string1,'- ziv-afilbercept',''); string1=TRANWRD(string1,'ziv-afilbercept -',''); string1=TRANWRD(string1,'ziv-afilbercept',''); /*Count the number of drugs*/ if missing(string1) then ComboCount=0; else ComboCount = count(string1," - " ) + 1; run;

Art, CEO, AnalystFinder.com

ChrisNZ
Tourmaline | Level 20

Like this?

 

data HAVE ;
  length STRING1 $80;
  STRING1 = 'sipuleucel-t   ' ; output;
  STRING1 = 'ziv-afilbercept' ;   output;
  STRING1 = 'paclitaxel albumin bound' ;output; 
  STRING1 = 'fluorouracil - oxaliplatin' ; output; 
  STRING1 = 'fluorouracil - irinotecan - ziv-afilbercept' ;output;
run;

data WANT;
  set HAVE;
  %* remove unwanted drugs from list;
  STRING2=prxchange('s/(\b(sipuleucel-t|ziv-afilbercept|ado-trastuzumab|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b)/ /oi',-1,STRING1);
  %* replace drug names with tilde;
  STRING3=prxchange('s/(\w[a-z -]*? (?=[ -]))/~ /oi',-1,STRING2);
  %* count tildes;
  COUNT=countc(STRING3,'~');
run;

 

STRING1 COUNT
sipuleucel-t 0
ziv-afilbercept 0
paclitaxel albumin bound 1
fluorouracil - oxaliplatin 2
fluorouracil - irinotecan - ziv-afilbercept 2
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Thank you. This looks like a much simpler way to accomplish what I want. When removing unwanted drugs, does the prxchange use partial matches or do the drugs in the list have to be exact matches? I want to omit all mabs from this count. So any drug that has 'mab' in the name would be excluded (ex. bevacizumab). There are also various 'interferon' drugs that need to be excluded. 

 

prxchange('s/(\w[a-z -]*? (?=[ -]))/~ /oi',-1,STRING2);

 

Also what does the -1 do in the prxchange function ?

 

ChrisNZ
Tourmaline | Level 20

To remove all drugs with mab:

 

  STRING2=prxchange('s/( ?\b(sipuleucel-t|ziv-afilbercept|[a-z-]*mab[a-z-]*|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b ?)/ /oi',-1,STRING1);

-1 seeks as many changes as possible.

 

1 would just do one replacement (and 2 would do 2 replacements at most)

ChrisNZ
Tourmaline | Level 20

You can also do the count using the method inspired by @mkeintz to reduce the usage of RegEx.

 

data HAVE ;
  length STRING1 $80;
  STRING1 = 'sipuleucel-t   ' ; output;
  STRING1 = 'ziv-afilbercept' ;   output;
  STRING1 = 'paclitaxel albumin bound' ;output; 
  STRING1 = 'fluorouracil - oxaliplatin - bevaci-zumab - gmabn' ; output; 
  STRING1 = 'fluorouracil - irinotecan - ziv-afilbercept' ;output;
run;

data WANT;
  set HAVE;
  %* replace drug separator ;
  STRING2=transtrn(string1,' - ',':');
  %* remove unwanted drugs from list;
  STRING3=prxchange('s/(\b(sipuleucel-t|ziv-afilbercept|[a-z-]*mab[a-z-]*|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b:?)//oi',-1,STRING2);
  %* count remaining drugs;
  COUNT=countw(trimn(STRING3),':');
run;

 

STRING1 COUNT
sipuleucel-t 0
ziv-afilbercept 0
paclitaxel albumin bound 1
fluorouracil - oxaliplatin - bevaci-zumab - gmabn 2
fluorouracil - irinotecan - ziv-afilbercept 2
mkeintz
PROC Star
  1. Copy the string to a temporary variable, changing all instances of ' - ' to ':'  (space-surrounded dashes to unsurrounded colons)  This assumes there are no colons in any drug name.
  2. Remove the unwanted text ('sipuleuce-t' and/or 'ziv-aflibercept')
  3. Use the COUNTW function to count "words", where the word separator is a ':'.

 

 

BTW, you mispelled aflibercept in a couple locations as afilbercept (they are all supposed to be the same right?).

 

data test;
  input string1 $60.;
  put string1=;
datalines;
sipuleucel-t
ziv-aflibercept
paclitaxel albumin bound
fluorouracil - oxaliplatin
fluorouracil - irinotecan - ziv-aflibercept
run;
data want;
  set test;

  strng2=transtrn(string1,' - ',':');

  strng2=transtrn(strng2,'sipuleucel-t',trimn(''));
  strng2=transtrn(strng2,'ziv-aflibercept',trimn(''));

  if strng2='' then combo=0;
  else combo=countw(trim(strng2),':');
  drop strng2;  
run;

 

 

This program counts the desired terms.  I leave the rest of the tasks to you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Shmuel
Garnet | Level 18

I would like to focus on " I need to differentiate between ' - ' and '-'. ":

 

You can use tranw function to replace the ' - ' into some delimiter like '#' (or any other delimiter)

Then use your code with scan(text,n,'#') to count.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1143 views
  • 6 likes
  • 5 in conversation