BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

I have a dataset with variable called drug_name. I'm want to subset to observations in which drug_name contains one of 200 strings corresponding to individual molecules. The brute force method is to use index repeatedly. For example, if there were only 3 strings of interest, I could do something like this:

 

data want; set have; where

index(drug_name, "ACETAMINOPHEN") > 0 or

index(drug_name, "IBUPROFEN") > 0 or

index(drug_name, "DIPHENHYDRAMINE") > 0;

run;

 

What is the more efficient way to do this? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Do you have the list of strings as dataset? Could you post some observations, so that we have something to play with?

SAS can create such a where-expression for you, the following code is hardly tested:

data DrugList;
   input name $upcase20.;
   datalines;
ACETAMiNOPhEN
IBUProFEN
DIPHENHyDRAMiNE
;


data _null_;
   set DrugList end=jobDone;
   file "drug_filter.sas";
   length Buffer $ 100;
   
   if _n_ = 1 then do;
      put 'where';
   end;
   
   Buffer = cats('index(drug_name, ', quote(trim(name)), ')');
   put '  ' Buffer @;
   
   if jobDone then do;
      put;
      put ';';
   end;
   else do;
      put ' or ';
   end;
run;

data want;
   set have;
   %include "drug_filter.sas";
run;

View solution in original post

9 REPLIES 9
Astounding
PROC Star

It depends on what you mean by "contains".  Doe sit mean the variable's value is an exact match to one of the 200 strings, or does it mean that the one of the 200 strings appears somewhere within the variable's value (possibly with other characters before or after).

chuakp
Obsidian | Level 7

To clarify, it does not have to be an exact match. I just want to subset to observations where one of the 200 strings appears somewhere within the variable's value.

Astounding
PROC Star

Then I think you are stuck.  You could remove all the comparison operators.  This would give you a valid comparison for one of the strings:

 

where index(var, 'ACETAMINOPHEN') or .....;

 

Also some functions have additional parameters that could ignore upper vs. lower case which might be a help.

andreas_lds
Jade | Level 19

Do you have the list of strings as dataset? Could you post some observations, so that we have something to play with?

SAS can create such a where-expression for you, the following code is hardly tested:

data DrugList;
   input name $upcase20.;
   datalines;
ACETAMiNOPhEN
IBUProFEN
DIPHENHyDRAMiNE
;


data _null_;
   set DrugList end=jobDone;
   file "drug_filter.sas";
   length Buffer $ 100;
   
   if _n_ = 1 then do;
      put 'where';
   end;
   
   Buffer = cats('index(drug_name, ', quote(trim(name)), ')');
   put '  ' Buffer @;
   
   if jobDone then do;
      put;
      put ';';
   end;
   else do;
      put ' or ';
   end;
run;

data want;
   set have;
   %include "drug_filter.sas";
run;
chuakp
Obsidian | Level 7

This code works, thanks very much!

s_lassen
Meteorite | Level 14

You can speed things up and simplify your program by using a format or an informat, e.g.:

proc format lib=work;
  invalue drug (upcase)
    "ACETAMINOPHEN",
    "IBUPROFEN",
    "DIPHENHYDRAMINE" =1
    other=0;
run;
    
    
data have;
infile cards truncover;
input text $200.;
cards;
 erew ACETAmINOPHEN werwerrewwerrwe
sdfælø klgdsklæ klgsdf
rewqv IBUPROFEN 3432 4
sgsfdsgfdsgfdgs
werreerw DIPHENHYDRAMINE 3422323442
;run; 

data want;
  set have;
  do _N_=1 to countw(text) until(found);
    found=input(scan(text,_N_),drug.);
    end;
run;

You may need to put some options on the SCAN function to get the words right.

chuakp
Obsidian | Level 7

Thanks. I think this would work, but the challenge would be that I would have to create a format with 200 values since I'm looking for 200 strings.

andreas_lds
Jade | Level 19

@chuakp wrote:

Thanks. I think this would work, but the challenge would be that I would have to create a format with 200 values since I'm looking for 200 strings.


This is not a problem at all. Proc Format accepts a dataset when using the cntlin-option.

 

 

Ksharp
Super User
data want; 
set have; 
if prxmatch('/\b(ACETAMINOPHEN|IBUPROFEN|DIPHENHYDRAMINE)\b/',drug_name );
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3751 views
  • 0 likes
  • 5 in conversation