I have a SAS output data set of drugs prescribed for a Medicaid population.
I have two issues: 1) setting up a if-then-else section to search for specific hiv/aids drugs by brand and/or generic. The following is the coding. The brand name is mentioned first and the generic is second. Another wrinkle is that there is one brand but (many times) several generics (or more like generic cocktail mix). (There are many lines so I am going to shorten the example. )
Data hivaids.hivaids_drugs (keep = DSC_NDC label);
set hal_data.mcaid_drug_dn;
If dsc_ndc eq "Emtriva" or "emtricitabine" then label eq "Yes";
Else If dsc_ndc eq "Delstrigo" or "doravirine" and "lamivudine" and
"tenofovir disoproxil fumarate" then label eq "Yes";
:
:
:
Else If dsc_ndc eq "Kaletra" or "lopinavir" and "ritonavir" then label eq "Yes";
So the error I run into is
INVALID numeric data, << drug name >> at line xxxx column xx
That error repeats many times. All the drug names on the right hand side of
the OR are the generics and there can be from two to five. I separated each by AND.
I received output which was basically just the column DSC_NDC in the SAS
output set. What I really want is to match the list of hiv/aids drugs against the SAS output set and then only keep those where label = "Yes".
That is the first problem.
Secondly in the SAS output set there may be several lines for the same drug depending upon how the drug is administered. For example,
Oncovin Vial
Oncovin Syringe
Oncovin Syringe
I just want to identify the name once - but I get every line for each drug by mode of delivery as listed in the SAS dataset.
So the second problem is just having the name of any drug match listed only once.
Thank you for you assistance on this.
Walt Lierman
Hi @wlierman,
Another syntax error is "if ... then label eq ...": You cannot replace the equals sign in an assignment statement by the comparison operator eq. So it must be "if ... then label = ...".
The structure of your code can be simplified:
if condition1 then label="Yes"; else if condition2 then label="Yes"; ... else if condition99 then label="Yes";
is equivalent to
if condition1 or condition2 or ... or condition99 then label="Yes";
If you want to keep only those observations where label="Yes", then you may not even need the new variable (label) because you can use a subsetting IF statement:
if condition1 or condition2 or ... or condition99;
The FINDW function and other character functions will be helpful in identifying combinations. For example, the IF condition
findw(dsc_ndc,'lopinavir',,'ikad') & findw(dsc_ndc,'ritonavir',,'ikad')
would be met for dsc_ndc values containing the words "lopinavir" and "ritonavir" regardless of upper/lower case (modifier "i" in the last argument), treating all characters except (modifier "k") alphabetic characters (modifier "a") and digits (modifier "d") as word delimiters. Examples of matched strings include:
Lopinavir/Ritonavir LOPINAVIR+RITONAVIR ritonavir and lopinavir ritonavir, AZT and lopinavir
Instead of using the modifiers you could also apply FINDW to, e.g., lowcase(dsc_ndc):
drug=lowcase(dsc_ndc);
if findw(drug,'lopinavir') & findw(drug,'ritonavir') ...;
Unlike the FINDW function, the FIND function would search for substrings and hence find, e.g. "amprenavir" in "fosamprenavir" -- even though these are different drugs.
More advanced character functions would be needed to deal with varying spellings or typos in the drug names, if any.
Duplicates can be eliminated in a subsequent step (e.g., PROC SORT with NODUPKEY option) or in the same step (e.g., PROC SQL with select distinct(upcase(dsc_ndc)) from hal_data.mcaid_drug_dn where condition1 or condition2 or ...). You may want to create a new variable containing the preferred name for the various equivalent verbatim terms so that a NODUPKEY sort by that variable would eliminate duplicates such as "Kaletra" and "lopinavir+ritonavir".
This invalid syntax:
If dsc_ndc eq "Emtriva" or "emtricitabine"
You want:
If dsc_ndc in ("Emtriva", "emtricitabine" )
I have no idea what this could mean:
If dsc_ndc eq "Kaletra" or "lopinavir" and "ritonavir"
Hi @wlierman,
Another syntax error is "if ... then label eq ...": You cannot replace the equals sign in an assignment statement by the comparison operator eq. So it must be "if ... then label = ...".
The structure of your code can be simplified:
if condition1 then label="Yes"; else if condition2 then label="Yes"; ... else if condition99 then label="Yes";
is equivalent to
if condition1 or condition2 or ... or condition99 then label="Yes";
If you want to keep only those observations where label="Yes", then you may not even need the new variable (label) because you can use a subsetting IF statement:
if condition1 or condition2 or ... or condition99;
The FINDW function and other character functions will be helpful in identifying combinations. For example, the IF condition
findw(dsc_ndc,'lopinavir',,'ikad') & findw(dsc_ndc,'ritonavir',,'ikad')
would be met for dsc_ndc values containing the words "lopinavir" and "ritonavir" regardless of upper/lower case (modifier "i" in the last argument), treating all characters except (modifier "k") alphabetic characters (modifier "a") and digits (modifier "d") as word delimiters. Examples of matched strings include:
Lopinavir/Ritonavir LOPINAVIR+RITONAVIR ritonavir and lopinavir ritonavir, AZT and lopinavir
Instead of using the modifiers you could also apply FINDW to, e.g., lowcase(dsc_ndc):
drug=lowcase(dsc_ndc);
if findw(drug,'lopinavir') & findw(drug,'ritonavir') ...;
Unlike the FINDW function, the FIND function would search for substrings and hence find, e.g. "amprenavir" in "fosamprenavir" -- even though these are different drugs.
More advanced character functions would be needed to deal with varying spellings or typos in the drug names, if any.
Duplicates can be eliminated in a subsequent step (e.g., PROC SORT with NODUPKEY option) or in the same step (e.g., PROC SQL with select distinct(upcase(dsc_ndc)) from hal_data.mcaid_drug_dn where condition1 or condition2 or ...). You may want to create a new variable containing the preferred name for the various equivalent verbatim terms so that a NODUPKEY sort by that variable would eliminate duplicates such as "Kaletra" and "lopinavir+ritonavir".
Richard,
Thank you for the extensive answer.
I appreciate your time and help.
Walt
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.