BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I have a SAS output data set of drugs prescribed for a Medicaid population.

 

I have two issues: 1) setting up a if-then-else section to search for specific hiv/aids drugs by brand and/or generic. The following is the coding. The brand name is mentioned first and the generic is second. Another wrinkle is that there is one brand but (many times) several generics (or more like generic cocktail mix). (There are many lines so I am going to shorten the example. )

 

Data hivaids.hivaids_drugs (keep = DSC_NDC label);

    set hal_data.mcaid_drug_dn;

    If dsc_ndc eq "Emtriva" or "emtricitabine" then label eq "Yes";

   Else If dsc_ndc eq "Delstrigo" or "doravirine" and "lamivudine" and

   "tenofovir disoproxil fumarate" then label eq "Yes";

:

:

:

Else If dsc_ndc eq "Kaletra" or "lopinavir" and "ritonavir" then label eq "Yes";

 

So the error I run into is

INVALID numeric data, << drug name >> at line xxxx column xx

 

That error repeats many times.  All the drug names on the right hand side of

the  OR are the generics and there can be from two to five. I separated each by AND.

I received output which was basically just the column DSC_NDC in the SAS

output set.  What I really want is to match the list of hiv/aids drugs against the SAS output set and then only keep those where label = "Yes".

 

That is the first problem.

 

Secondly in the SAS output set there may be several lines for the same drug depending upon how the drug is administered. For example,

       Oncovin             Vial

       Oncovin              Syringe

      Oncovin               Syringe

 

I just want to identify the name once - but I get every line for each drug by mode of delivery as listed in the SAS dataset.

 

So the second problem is just having the name of any drug match listed only once.

 

Thank you for you assistance on this.

Walt Lierman

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @wlierman,

 

Another syntax error is "if ... then label eq ...": You cannot replace the equals sign in an assignment statement by the comparison operator eq. So it must be "if ... then label = ...".

 

The structure of your code can be simplified:

if condition1 then label="Yes";
else if condition2 then label="Yes";
...
else if condition99 then label="Yes";

is equivalent to

if condition1 or condition2 or ... or condition99 then label="Yes";

 

If you want to keep only those observations where label="Yes", then you may not even need the new variable (label) because you can use a subsetting IF statement:

if condition1 or condition2 or ... or condition99;

 

The FINDW function and other character functions will be helpful in identifying combinations. For example, the IF condition

findw(dsc_ndc,'lopinavir',,'ikad') & findw(dsc_ndc,'ritonavir',,'ikad')

would be met for dsc_ndc values containing the words "lopinavir" and "ritonavir" regardless of upper/lower case (modifier "i" in the last argument), treating all characters except (modifier "k") alphabetic characters (modifier "a") and digits (modifier "d") as word delimiters. Examples of matched strings include:

Lopinavir/Ritonavir
LOPINAVIR+RITONAVIR
ritonavir and lopinavir
ritonavir, AZT and lopinavir

Instead of using the modifiers you could also apply FINDW to, e.g., lowcase(dsc_ndc):

drug=lowcase(dsc_ndc);
if findw(drug,'lopinavir') & findw(drug,'ritonavir') ...;

 

Unlike the FINDW function, the FIND function would search for substrings and hence find, e.g. "amprenavir" in "fosamprenavir" -- even though these are different drugs.

 

More advanced character functions would be needed to deal with varying spellings or typos in the drug names, if any.

 

Duplicates can be eliminated in a subsequent step (e.g., PROC SORT with NODUPKEY option) or in the same step (e.g., PROC SQL with select distinct(upcase(dsc_ndc)) from hal_data.mcaid_drug_dn where condition1 or condition2 or ...). You may want to create a new variable containing the preferred name for the various equivalent verbatim terms so that a NODUPKEY sort by that variable would eliminate duplicates such as "Kaletra" and "lopinavir+ritonavir".

View solution in original post

3 REPLIES 3
ChrisNZ
Tourmaline | Level 20

This invalid syntax:

   If dsc_ndc eq "Emtriva" or "emtricitabine" 

You want:

   If dsc_ndc in ("Emtriva", "emtricitabine" )

 

I have no idea what this could mean:

   If dsc_ndc eq "Kaletra" or "lopinavir" and "ritonavir" 

 

 

 

 

FreelanceReinh
Jade | Level 19

Hi @wlierman,

 

Another syntax error is "if ... then label eq ...": You cannot replace the equals sign in an assignment statement by the comparison operator eq. So it must be "if ... then label = ...".

 

The structure of your code can be simplified:

if condition1 then label="Yes";
else if condition2 then label="Yes";
...
else if condition99 then label="Yes";

is equivalent to

if condition1 or condition2 or ... or condition99 then label="Yes";

 

If you want to keep only those observations where label="Yes", then you may not even need the new variable (label) because you can use a subsetting IF statement:

if condition1 or condition2 or ... or condition99;

 

The FINDW function and other character functions will be helpful in identifying combinations. For example, the IF condition

findw(dsc_ndc,'lopinavir',,'ikad') & findw(dsc_ndc,'ritonavir',,'ikad')

would be met for dsc_ndc values containing the words "lopinavir" and "ritonavir" regardless of upper/lower case (modifier "i" in the last argument), treating all characters except (modifier "k") alphabetic characters (modifier "a") and digits (modifier "d") as word delimiters. Examples of matched strings include:

Lopinavir/Ritonavir
LOPINAVIR+RITONAVIR
ritonavir and lopinavir
ritonavir, AZT and lopinavir

Instead of using the modifiers you could also apply FINDW to, e.g., lowcase(dsc_ndc):

drug=lowcase(dsc_ndc);
if findw(drug,'lopinavir') & findw(drug,'ritonavir') ...;

 

Unlike the FINDW function, the FIND function would search for substrings and hence find, e.g. "amprenavir" in "fosamprenavir" -- even though these are different drugs.

 

More advanced character functions would be needed to deal with varying spellings or typos in the drug names, if any.

 

Duplicates can be eliminated in a subsequent step (e.g., PROC SORT with NODUPKEY option) or in the same step (e.g., PROC SQL with select distinct(upcase(dsc_ndc)) from hal_data.mcaid_drug_dn where condition1 or condition2 or ...). You may want to create a new variable containing the preferred name for the various equivalent verbatim terms so that a NODUPKEY sort by that variable would eliminate duplicates such as "Kaletra" and "lopinavir+ritonavir".

wlierman
Lapis Lazuli | Level 10

Richard,

 

Thank you for the extensive answer.

 

I appreciate your time and help.

 

Walt

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 588 views
  • 2 likes
  • 3 in conversation