Solved: Re: Deleting single data element from observation when condition is me...

mtr91 · Posted 06-15-2022 11:30 AM

I have a table of >3,000 individual toxicology reports. The results include standalone drug results (ie 'thc' 'oxycodone' midazolam') as well as screening results for positive or negative (ie 'cannabinoids' 'benzodiazepines' 'opiates'). I need to remove screening elements for each observation (id) when the standalone drug is present with the screening result. But I also want to keep screening elements when standalone drug results are missing.

Here is what I HAVE:

id	drug1	drug2	drug3	drug4	drug5
001	thc/carboxy-thc	cannabinoids
002	amphetamine	benzodiazepines	midazolam
003	cannabinoids	carboxy-thc
004	cannabinoids	hydromorphone	morphine	opiates	oxycodone
005	amphetamines	cannabinoids	thc
006	cannabinoids

Here is what I WANT:

id	drug1	drug2	drug3	drug4	drug5
001	thc/carboxy-thc
002	amphetamine		midazolam
003		carboxy-thc
004	cannabinoids	hydromorphone	morphine		oxycodone
005	amphetamines		thc
006	cannabinoids

Here is a sample of the data:

data have;
   infile datalines truncover;
   informat id $5. drug1-drug5 $40.;
   input id drug1 drug2 drug3 drug4 drug5;
datalines;
001 thc/carboxy-thc cannabinoids
002 amphetamine benzodiazepines midazolam
003 cannabinoids carboxy-thc
004 cannabinoids hydromorphone morphine opiates oxycodone
005 amphetamines cannabinoids thc
006 cannabinoids
;

I started to attempt an array, but have limited experience and couldn't resolve:

data want;
   set have end=done;
array p(*) drug1-drug5;
array druglist(5) $40;
do i=1 to dim(p);
if p(i) in ('thc' 'carboxy-thc' 'thc/carboxy-thc') then druglist(i)=  ;
end;

Thanks!

Tom · Posted 06-15-2022 01:12 PM

You need to create the data to drive the analysis.

So first let's make an example dataset for mapping the DRUG to the CLASS.

data classes ;
  input drug :$40. class :$40.;
cards;
thc/carboxy-thc cannabinoids
thc cannabinoids
carboxy-thc cannabinoids
amphetamine benzodiazepines
morphine opiates
;

Now let's run a data step that loads that dataset into an HASH object so we can use it to convert the drug into the class. You could also do that part using a format instead.

So for each element in the array of drugs we check if it has a CLASS associated with it. Then we blank out any values of the values in the array that have that class name as the drug name.


data want ;
  set have classes(obs=0) ;
  if _n_=1 then do;
    declare hash h(dataset:'classes');
    rc=h.definekey('drug');
    rc=h.definedata('class');
    rc=h.definedone();
  end;
  array drugs drug1-drug5 ;
  do index=1 to dim(drugs);
    if not h.find(key:drugs[index]) then do;
      next = whichc(class,of drugs[*]);
      do while (next) ;
        drugs[next]=' ';
        next = whichc(class,of drugs[*]);
      end;
    end;  
  end;
  drop rc drug class index next;
run;

Note this REQUIRES that you have the mapping.

It also will not work (without additional logic) if the DRUG name of some drug happens to match the CLASS name of some drug class.

If you know there are never any duplicate names in the variables in the array, that is you will never have 'cannibis' in more than one of the DRUG1 to DRUG5 variables, then you can reduce the inner most DO WHILE loop to a simple IF/THEN instead.

      if next then drugs[next]=' ';

View solution in original post

ballardw · Posted 06-15-2022 11:46 AM

You are going to have to provide some details of why cannabinoids is removed for ID3 in drug1 but not for ID4.

You need to cast statements in terms of DATA elements provided. You have used "screening elements" without a definition or rules. Same for "standalone drug". Note that an example is not the same as a rule unless you explicitly state that those are the only values that can occur. But the way you treat cannabinoid in the example above means that is not the case.

mtr91 · Posted 06-15-2022 12:06 PM

The difference is screening results (cannabinoids) and confirmatory results (thc, carboxy-thc, thc/carboxy-thc).

id-003: cannabinoids is removed because it is a screening indicator for carboxy-thc (also thc, thc/carboxy-thc), so I want to remove the screening indicator when the standalone drug is present.

id-004: cannabinoids remains because the toxicology results do not show standalone thc, carboxy-thc, or thc/carboxy-thc.

So, when the standalone drug is present (carboyxy-thc) I want to remove the screening indicator because I have confirmed results of the standalone drug. However, if I do not have confirmatory results, but a screening result was detected, then I want to retain the screening element (cannabinoids)so that way I know that those drugs were still present, even when the standalone drug result (thc, carboxy-thc) is absent

Reeza · Posted 06-15-2022 11:55 AM

You'll also need to provide mappings of which drugs are in the same classification.

Ie THC and cannibinoids are the same classification, Opiods, Oxycodone, Hydromorphone are all in the same classification.

Is it only the drugs listed in this shown table or does it extend beyond these drugs?

mtr91 · Posted 06-15-2022 12:10 PM

Cannabinoids (thc, carboxy-thc, thc/carboxy-thc)

Opiates (hydromorphone, morphine, oxycodone)

Benzodiazepines (midazolam)

That covers things for the sample, but there are dozens of other drugs for other categories that I have not listed here. I was hoping to get a basic example of how to execute this and then I could fill in the blanks and add additional conditions and categories.

ballardw · Posted 06-15-2022 12:31 PM

Here is a possible example of a RULE that can be programmed:

Delete cannabinoids from a Drug variable when one or more of the other drug variables contains one of thc, carboxy-thc, thc/carboxy-thc (is that the entire list???).

Or

Keep the value of cannabinoids only when NONE of the other drug variables contain any of thc, carboxy-thc, thc/carboxy-thc.

Nothing about "standalone" "screen" or such in the rule. Review these two suggestions and see if they are correct.

Which means that you need to provide such a rule for each and the comparison list has to been exhaustive (cover every possible value.

mtr91 · Posted 06-15-2022 12:45 PM

Yes. I had mocked up a rule similar to this such that:

if drug1-drug6 in ("thc" "carboxy-thc" "thc/carboxy-thc" "thc/carboxy-thc/11-hydroxy delta 9 thc") then delete "cannabinoids";

So, yes, if a row has 'thc' or 'carboxy-thc' or 'thc/carboxy-thc' or thc/carboxy-thc/11-hydroxy delta 9 thc' AND 'cannabinoids', I want to DELETE 'cannabinoids'.

BUT if a row has 'cannabinoids' and DOES NOT HAVE 'thc' or 'carboxy-thc' or 'thc/carboxy-thc' or thc/carboxy-thc/11-hydroxy delta 9 thc' then I want to KEEP'cannabinoids'

This is the same as your first statement. I can go through my list of dozens of drugs and edit the rule for each scenario. I just need some code that I can plug in to.

mkeintz · Posted 06-15-2022 12:08 PM

It looks to me that you have a "black list" of screening results that you want to blank out in the DRUG1-DRUG5 variables.

If so, you can establish a hash object.(think lookup table) of black list terms. Then step through the array of drug variables and check whether the array element is found in the hash object (i.e. the black_list.check method returns a zero).

The DO loop not only converts some array elements to blanks, but also stops when a blank element is already there, because it assumes that once a blank is found, all the elements to the right will also be blank.

Untested code follows, in the absence of sample data in the form of a working data step:

data want (drop=d _:);
  set have;
  if _n_=1 then do;
    length _screen $40;
    declare hash black_list ();
      black_list.definekey('_screen');
      black_list.definedone();
    do _screen = 'cannabinoids','benzodiazepines','opiates';
      black_list.add();
    end;
  end;
  array drg {*} $40 drug1-drug5 ;
  do d=1 to dim(drg) while (drg{d}^=' ');
    if black_list.check(key:drg{d})=0 then drg{d}=' ';
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mtr91 · Posted 06-15-2022 12:19 PM

Sorry. I tested the data step and it worked for me. Not sure why it isn't working for others.

I tested the code and it technically works, but another condition I need is retaining the screening element (cannabinoids) if the standalone/confirmatory results (thc, carboxy-thc, thc/carboxy-thc) is absent.

Your code gave me this:

id	drug1	drug2	drug3	drug4	drug5
001	thc/carboxy-thc
002	amphetamine		midazolam
003		carboxy-thc
004	~~cannabinoids~~	hydromorphone	morphine		oxycodone
005	amphetamines		thc
006	~~cannabinoids~~

The two cannabinoids with the strike through are missing from the output table. These are conditions where I still want to know that a person tested positive for cannabinoids when there was not a result for thc, carboxy-thc, etc.

I know this is a tricky request... Thank you for your efforts!

Tom · Posted 06-15-2022 01:12 PM

You need to create the data to drive the analysis.

So first let's make an example dataset for mapping the DRUG to the CLASS.

data classes ;
  input drug :$40. class :$40.;
cards;
thc/carboxy-thc cannabinoids
thc cannabinoids
carboxy-thc cannabinoids
amphetamine benzodiazepines
morphine opiates
;

Now let's run a data step that loads that dataset into an HASH object so we can use it to convert the drug into the class. You could also do that part using a format instead.

So for each element in the array of drugs we check if it has a CLASS associated with it. Then we blank out any values of the values in the array that have that class name as the drug name.


data want ;
  set have classes(obs=0) ;
  if _n_=1 then do;
    declare hash h(dataset:'classes');
    rc=h.definekey('drug');
    rc=h.definedata('class');
    rc=h.definedone();
  end;
  array drugs drug1-drug5 ;
  do index=1 to dim(drugs);
    if not h.find(key:drugs[index]) then do;
      next = whichc(class,of drugs[*]);
      do while (next) ;
        drugs[next]=' ';
        next = whichc(class,of drugs[*]);
      end;
    end;  
  end;
  drop rc drug class index next;
run;

Note this REQUIRES that you have the mapping.

It also will not work (without additional logic) if the DRUG name of some drug happens to match the CLASS name of some drug class.

If you know there are never any duplicate names in the variables in the array, that is you will never have 'cannibis' in more than one of the DRUG1 to DRUG5 variables, then you can reduce the inner most DO WHILE loop to a simple IF/THEN instead.

      if next then drugs[next]=' ';

mtr91 · Posted 06-15-2022 02:19 PM

It works! This is an intimidating chunk of code.. clearly I have much more to learn about SAS, but this is a great example to help me complete the task and similar future projects. Thanks for your help, I would not have been able to figure this out on my own 🙂

Tom · Posted 06-15-2022 03:14 PM

@mtr91 wrote:

It works! This is an intimidating chunk of code.. clearly I have much more to learn about SAS, but this is a great example to help me complete the task and similar future projects. Thanks for your help, I would not have been able to figure this out on my own 🙂

It is still going to be much easier to deal with this data if you transpose it to have a single DRUG variable and multiple observations when there are multiple drugs. Then your problem here becomes one of only keeping the CLASS names in the DRUG column when they are the only DRUG listed for that class.

data want;
  set have ;
  by id class drug;
  if class = drug and not (first.class and last.class) then delete;
run;

Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

Re: Deleting single data element from observation when condition is met

SAS Innovate 2025: Call for Content