DATA Step, Macro, Functions and more

scan function

Reply
Frequent Contributor
Frequent Contributor
Posts: 134

scan function

I am using a scan function to count the number of drugs in a regimen; however, I need to differentiate between ' - ' and '-'. In other words, if there is not a space before and after a hyphen then I want to count that as one drug. I am also listing some drugs that do not get counted.  For example, I want  initialChemoCount =0 for sipuleucel-t and ziv-afilbercept and initialChemoCount = 2 for 'fluorouracil - oxaliplatin'. I also want to make sure drug names with spaces is counted as 1 drug (e.g. string1 = 'paclitaxel albumin bound' ).

 

The code below is close but not working because the scan reads 'sipuleucel-t' as two separate drugs and 'fluorouracil - irinotecan - ziv-aflibercept' as 3 when it should be 2 based on exclusions. Any suggestions?

 

data test  ;
*string1 = 'sipuleucel-t' ;  /*initialchemocount = 0 because I want to exclude them in the count*/

*string1 = 'ziv-afilbercept' ; /*initialchemocount = 0 because I want to exclude them in the count*/

*string1 = 'paclitaxel albumin bound' ; /*initialchemocount = 1*/
*string1 = 'fluorouracil - oxaliplatin' ;  /*initialchemocount = 2*/

string1 = 'fluorouracil - irinotecan - ziv-aflibercept'  /*initialchemocount = 2 because I want to exclude ziv-aflibercept*/

 

/*Count the number of drugs in original drug combo*/
ComboCount = count(string1," - " ) + 1;

/*Count the number of chemo drugs in original drug combo*/
array initchemo{20} $200 ;

initialChemoCount = comboCount ;
do j = 1 to combocount until (p <= 0) ;
call scan(string1,j,p,l," - ") ;
if p=1 then initchemo[j] = substrn(string1, p, l-1); /*use p and l to account for legitimate space e.g. paclitaxel albumin bound */
else initchemo[j] = substrn(string1,p+1,l-1) ;

 

if index(initchemo[j],'investigational' ) then initialchemocount=. ;
else if initchemo[j] IN ('sipuleucel-t', 'ziv-afilbercept', 'abiraterone', 'enzalutamide', 'interferon alfa-2b', 'radium Ra 223 dichloride' )
then initialchemocount=0 ;
else if prxmatch("m/sipuleucel-t|ziv-afilbercept|ado-trastuzumab|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride/oi",initchemo[j]) > 0
then initialChemoCount = initialchemocount - 1 ;
end ;

run;

PROC PRINT DATA=test ;
VAR string1 initialchemocount combocount chemodrugcount initchemo:;
run;

PROC Star
Posts: 7,468

Re: scan function

[ Edited ]

If I correctly understand what you're trying to do, then all you need is:

data have;
length string1 $255; string1 = 'sipuleucel-t' ; output; string1 = 'ziv-afilbercept' ;output; string1 = 'paclitaxel albumin bound';output; string1 = 'fluorouracil - oxaliplatin' ;output; string1 = 'fluorouracil - irinotecan - ziv-aflibercept';output; run; data test; set have; string1=TRANWRD(string1,'- sipuleucel-t',''); string1=TRANWRD(string1,'sipuleucel-t -',''); string1=TRANWRD(string1,'sipuleucel-t',''); string1=TRANWRD(string1,'- ziv-afilbercept',''); string1=TRANWRD(string1,'ziv-afilbercept -',''); string1=TRANWRD(string1,'ziv-afilbercept',''); /*Count the number of drugs*/ if missing(string1) then ComboCount=0; else ComboCount = count(string1," - " ) + 1; run;

Art, CEO, AnalystFinder.com

PROC Star
Posts: 1,759

Re: scan function

[ Edited ]

Like this?

 

data HAVE ;
  length STRING1 $80;
  STRING1 = 'sipuleucel-t   ' ; output;
  STRING1 = 'ziv-afilbercept' ;   output;
  STRING1 = 'paclitaxel albumin bound' ;output; 
  STRING1 = 'fluorouracil - oxaliplatin' ; output; 
  STRING1 = 'fluorouracil - irinotecan - ziv-afilbercept' ;output;
run;

data WANT;
  set HAVE;
  %* remove unwanted drugs from list;
  STRING2=prxchange('s/(\b(sipuleucel-t|ziv-afilbercept|ado-trastuzumab|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b)/ /oi',-1,STRING1);
  %* replace drug names with tilde;
  STRING3=prxchange('s/(\w[a-z -]*? (?=[ -]))/~ /oi',-1,STRING2);
  %* count tildes;
  COUNT=countc(STRING3,'~');
run;

 

STRING1 COUNT
sipuleucel-t 0
ziv-afilbercept 0
paclitaxel albumin bound 1
fluorouracil - oxaliplatin 2
fluorouracil - irinotecan - ziv-afilbercept 2
Frequent Contributor
Frequent Contributor
Posts: 134

Re: scan function

Thank you. This looks like a much simpler way to accomplish what I want. When removing unwanted drugs, does the prxchange use partial matches or do the drugs in the list have to be exact matches? I want to omit all mabs from this count. So any drug that has 'mab' in the name would be excluded (ex. bevacizumab). There are also various 'interferon' drugs that need to be excluded. 

 

prxchange('s/(\w[a-z -]*? (?=[ -]))/~ /oi',-1,STRING2);

 

Also what does the -1 do in the prxchange function ?

 

PROC Star
Posts: 1,759

Re: scan function

To remove all drugs with mab:

 

  STRING2=prxchange('s/( ?\b(sipuleucel-t|ziv-afilbercept|[a-z-]*mab[a-z-]*|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b ?)/ /oi',-1,STRING1);

-1 seeks as many changes as possible.

 

1 would just do one replacement (and 2 would do 2 replacements at most)

PROC Star
Posts: 1,759

Re: scan function

[ Edited ]

You can also do the count using the method inspired by @mkeintz to reduce the usage of RegEx.

 

data HAVE ;
  length STRING1 $80;
  STRING1 = 'sipuleucel-t   ' ; output;
  STRING1 = 'ziv-afilbercept' ;   output;
  STRING1 = 'paclitaxel albumin bound' ;output; 
  STRING1 = 'fluorouracil - oxaliplatin - bevaci-zumab - gmabn' ; output; 
  STRING1 = 'fluorouracil - irinotecan - ziv-afilbercept' ;output;
run;

data WANT;
  set HAVE;
  %* replace drug separator ;
  STRING2=transtrn(string1,' - ',':');
  %* remove unwanted drugs from list;
  STRING3=prxchange('s/(\b(sipuleucel-t|ziv-afilbercept|[a-z-]*mab[a-z-]*|interferon|mab|abiraterone|enzalutamide|radium Ra 223 dichloride)\b:?)//oi',-1,STRING2);
  %* count remaining drugs;
  COUNT=countw(trimn(STRING3),':');
run;

 

STRING1 COUNT
sipuleucel-t 0
ziv-afilbercept 0
paclitaxel albumin bound 1
fluorouracil - oxaliplatin - bevaci-zumab - gmabn 2
fluorouracil - irinotecan - ziv-afilbercept 2
Trusted Advisor
Posts: 1,018

Re: scan function

  1. Copy the string to a temporary variable, changing all instances of ' - ' to ':'  (space-surrounded dashes to unsurrounded colons)  This assumes there are no colons in any drug name.
  2. Remove the unwanted text ('sipuleuce-t' and/or 'ziv-aflibercept')
  3. Use the COUNTW function to count "words", where the word separator is a ':'.

 

 

BTW, you mispelled aflibercept in a couple locations as afilbercept (they are all supposed to be the same right?).

 

data test;
  input string1 $60.;
  put string1=;
datalines;
sipuleucel-t
ziv-aflibercept
paclitaxel albumin bound
fluorouracil - oxaliplatin
fluorouracil - irinotecan - ziv-aflibercept
run;
data want;
  set test;

  strng2=transtrn(string1,' - ',':');

  strng2=transtrn(strng2,'sipuleucel-t',trimn(''));
  strng2=transtrn(strng2,'ziv-aflibercept',trimn(''));

  if strng2='' then combo=0;
  else combo=countw(trim(strng2),':');
  drop strng2;  
run;

 

 

This program counts the desired terms.  I leave the rest of the tasks to you.

Trusted Advisor
Posts: 1,554

Re: scan function

I would like to focus on " I need to differentiate between ' - ' and '-'. ":

 

You can use tranw function to replace the ' - ' into some delimiter like '#' (or any other delimiter)

Then use your code with scan(text,n,'#') to count.

Ask a Question
Discussion stats
  • 7 replies
  • 219 views
  • 6 likes
  • 5 in conversation