I have a table of >3,000 individual toxicology reports. The results include standalone drug results (ie 'thc' 'oxycodone' midazolam') as well as screening results for positive or negative (ie 'cannabinoids' 'benzodiazepines' 'opiates'). I need to remove screening elements for each observation (id) when the standalone drug is present with the screening result. But I also want to keep screening elements when standalone drug results are missing.
Here is what I HAVE:
id | drug1 | drug2 | drug3 | drug4 | drug5 |
001 | thc/carboxy-thc | cannabinoids | |||
002 | amphetamine | benzodiazepines | midazolam | ||
003 | cannabinoids | carboxy-thc | |||
004 | cannabinoids | hydromorphone | morphine | opiates | oxycodone |
005 | amphetamines | cannabinoids | thc | ||
006 | cannabinoids |
Here is what I WANT:
id | drug1 | drug2 | drug3 | drug4 | drug5 |
001 | thc/carboxy-thc | ||||
002 | amphetamine | midazolam | |||
003 | carboxy-thc | ||||
004 | cannabinoids | hydromorphone | morphine | oxycodone | |
005 | amphetamines | thc | |||
006 | cannabinoids |
Here is a sample of the data:
data have; infile datalines truncover; informat id $5. drug1-drug5 $40.; input id drug1 drug2 drug3 drug4 drug5; datalines; 001 thc/carboxy-thc cannabinoids 002 amphetamine benzodiazepines midazolam 003 cannabinoids carboxy-thc 004 cannabinoids hydromorphone morphine opiates oxycodone 005 amphetamines cannabinoids thc 006 cannabinoids ;
I started to attempt an array, but have limited experience and couldn't resolve:
data want; set have end=done; array p(*) drug1-drug5; array druglist(5) $40; do i=1 to dim(p); if p(i) in ('thc' 'carboxy-thc' 'thc/carboxy-thc') then druglist(i)= ; end;
Thanks!
You need to create the data to drive the analysis.
So first let's make an example dataset for mapping the DRUG to the CLASS.
data classes ;
input drug :$40. class :$40.;
cards;
thc/carboxy-thc cannabinoids
thc cannabinoids
carboxy-thc cannabinoids
amphetamine benzodiazepines
morphine opiates
;
Now let's run a data step that loads that dataset into an HASH object so we can use it to convert the drug into the class. You could also do that part using a format instead.
So for each element in the array of drugs we check if it has a CLASS associated with it. Then we blank out any values of the values in the array that have that class name as the drug name.
data want ;
set have classes(obs=0) ;
if _n_=1 then do;
declare hash h(dataset:'classes');
rc=h.definekey('drug');
rc=h.definedata('class');
rc=h.definedone();
end;
array drugs drug1-drug5 ;
do index=1 to dim(drugs);
if not h.find(key:drugs[index]) then do;
next = whichc(class,of drugs[*]);
do while (next) ;
drugs[next]=' ';
next = whichc(class,of drugs[*]);
end;
end;
end;
drop rc drug class index next;
run;
Note this REQUIRES that you have the mapping.
It also will not work (without additional logic) if the DRUG name of some drug happens to match the CLASS name of some drug class.
If you know there are never any duplicate names in the variables in the array, that is you will never have 'cannibis' in more than one of the DRUG1 to DRUG5 variables, then you can reduce the inner most DO WHILE loop to a simple IF/THEN instead.
if next then drugs[next]=' ';
You are going to have to provide some details of why cannabinoids is removed for ID3 in drug1 but not for ID4.
You need to cast statements in terms of DATA elements provided. You have used "screening elements" without a definition or rules. Same for "standalone drug". Note that an example is not the same as a rule unless you explicitly state that those are the only values that can occur. But the way you treat cannabinoid in the example above means that is not the case.
The difference is screening results (cannabinoids) and confirmatory results (thc, carboxy-thc, thc/carboxy-thc).
id-003: cannabinoids is removed because it is a screening indicator for carboxy-thc (also thc, thc/carboxy-thc), so I want to remove the screening indicator when the standalone drug is present.
id-004: cannabinoids remains because the toxicology results do not show standalone thc, carboxy-thc, or thc/carboxy-thc.
So, when the standalone drug is present (carboyxy-thc) I want to remove the screening indicator because I have confirmed results of the standalone drug. However, if I do not have confirmatory results, but a screening result was detected, then I want to retain the screening element (cannabinoids)so that way I know that those drugs were still present, even when the standalone drug result (thc, carboxy-thc) is absent
You'll also need to provide mappings of which drugs are in the same classification.
Ie THC and cannibinoids are the same classification, Opiods, Oxycodone, Hydromorphone are all in the same classification.
Is it only the drugs listed in this shown table or does it extend beyond these drugs?
Cannabinoids (thc, carboxy-thc, thc/carboxy-thc)
Opiates (hydromorphone, morphine, oxycodone)
Benzodiazepines (midazolam)
That covers things for the sample, but there are dozens of other drugs for other categories that I have not listed here. I was hoping to get a basic example of how to execute this and then I could fill in the blanks and add additional conditions and categories.
Here is a possible example of a RULE that can be programmed:
Delete cannabinoids from a Drug variable when one or more of the other drug variables contains one of thc, carboxy-thc, thc/carboxy-thc (is that the entire list???).
Or
Keep the value of cannabinoids only when NONE of the other drug variables contain any of thc, carboxy-thc, thc/carboxy-thc.
Nothing about "standalone" "screen" or such in the rule. Review these two suggestions and see if they are correct.
Which means that you need to provide such a rule for each and the comparison list has to been exhaustive (cover every possible value.
Yes. I had mocked up a rule similar to this such that:
if drug1-drug6 in ("thc" "carboxy-thc" "thc/carboxy-thc" "thc/carboxy-thc/11-hydroxy delta 9 thc") then delete "cannabinoids";
So, yes, if a row has 'thc' or 'carboxy-thc' or 'thc/carboxy-thc' or thc/carboxy-thc/11-hydroxy delta 9 thc' AND 'cannabinoids', I want to DELETE 'cannabinoids'.
BUT if a row has 'cannabinoids' and DOES NOT HAVE 'thc' or 'carboxy-thc' or 'thc/carboxy-thc' or thc/carboxy-thc/11-hydroxy delta 9 thc' then I want to KEEP'cannabinoids'
This is the same as your first statement. I can go through my list of dozens of drugs and edit the rule for each scenario. I just need some code that I can plug in to.
It looks to me that you have a "black list" of screening results that you want to blank out in the DRUG1-DRUG5 variables.
If so, you can establish a hash object.(think lookup table) of black list terms. Then step through the array of drug variables and check whether the array element is found in the hash object (i.e. the black_list.check method returns a zero).
The DO loop not only converts some array elements to blanks, but also stops when a blank element is already there, because it assumes that once a blank is found, all the elements to the right will also be blank.
Untested code follows, in the absence of sample data in the form of a working data step:
data want (drop=d _:);
set have;
if _n_=1 then do;
length _screen $40;
declare hash black_list ();
black_list.definekey('_screen');
black_list.definedone();
do _screen = 'cannabinoids','benzodiazepines','opiates';
black_list.add();
end;
end;
array drg {*} $40 drug1-drug5 ;
do d=1 to dim(drg) while (drg{d}^=' ');
if black_list.check(key:drg{d})=0 then drg{d}=' ';
end;
run;
Sorry. I tested the data step and it worked for me. Not sure why it isn't working for others.
I tested the code and it technically works, but another condition I need is retaining the screening element (cannabinoids) if the standalone/confirmatory results (thc, carboxy-thc, thc/carboxy-thc) is absent.
Your code gave me this:
id | drug1 | drug2 | drug3 | drug4 | drug5 |
001 | thc/carboxy-thc | ||||
002 | amphetamine | midazolam | |||
003 | carboxy-thc | ||||
004 | hydromorphone | morphine | oxycodone | ||
005 | amphetamines | thc | |||
006 |
The two cannabinoids with the strike through are missing from the output table. These are conditions where I still want to know that a person tested positive for cannabinoids when there was not a result for thc, carboxy-thc, etc.
I know this is a tricky request... Thank you for your efforts!
You need to create the data to drive the analysis.
So first let's make an example dataset for mapping the DRUG to the CLASS.
data classes ;
input drug :$40. class :$40.;
cards;
thc/carboxy-thc cannabinoids
thc cannabinoids
carboxy-thc cannabinoids
amphetamine benzodiazepines
morphine opiates
;
Now let's run a data step that loads that dataset into an HASH object so we can use it to convert the drug into the class. You could also do that part using a format instead.
So for each element in the array of drugs we check if it has a CLASS associated with it. Then we blank out any values of the values in the array that have that class name as the drug name.
data want ;
set have classes(obs=0) ;
if _n_=1 then do;
declare hash h(dataset:'classes');
rc=h.definekey('drug');
rc=h.definedata('class');
rc=h.definedone();
end;
array drugs drug1-drug5 ;
do index=1 to dim(drugs);
if not h.find(key:drugs[index]) then do;
next = whichc(class,of drugs[*]);
do while (next) ;
drugs[next]=' ';
next = whichc(class,of drugs[*]);
end;
end;
end;
drop rc drug class index next;
run;
Note this REQUIRES that you have the mapping.
It also will not work (without additional logic) if the DRUG name of some drug happens to match the CLASS name of some drug class.
If you know there are never any duplicate names in the variables in the array, that is you will never have 'cannibis' in more than one of the DRUG1 to DRUG5 variables, then you can reduce the inner most DO WHILE loop to a simple IF/THEN instead.
if next then drugs[next]=' ';
It works! This is an intimidating chunk of code.. clearly I have much more to learn about SAS, but this is a great example to help me complete the task and similar future projects. Thanks for your help, I would not have been able to figure this out on my own 🙂
@mtr91 wrote:
It works! This is an intimidating chunk of code.. clearly I have much more to learn about SAS, but this is a great example to help me complete the task and similar future projects. Thanks for your help, I would not have been able to figure this out on my own 🙂
It is still going to be much easier to deal with this data if you transpose it to have a single DRUG variable and multiple observations when there are multiple drugs. Then your problem here becomes one of only keeping the CLASS names in the DRUG column when they are the only DRUG listed for that class.
data want;
set have ;
by id class drug;
if class = drug and not (first.class and last.class) then delete;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.