Solved: Re: How to flag if two variables per patient are met, but not on the s...

Nina4 · Posted 03-15-2024 11:19 AM

Hello everyone

I have a dataset with drug claims and would like to flag patient, that have drug interactions. The dataset looks like that:

data your_dataset;
input PatientID Drug Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 $;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

a drug cannot make an interaction with itself. Therefore I would like to flag all lines of a patient that has an interaction between two drugs. How can I do that?

The answer should look like this:

PatientID	Drug	Interaction_partner_1	Interaction_partner_2	Interaction_partner_3	Interaction
1	Drug A	1	1	0	1
1	Drug B	0	0	1	1
1	Drug C	0	1	0	1
2	Drug A	1	1	0	0
2	Drug D	0	0	0	0
2	Drug A	1	1	0	0
3	Drug B	0	0	1	1
3	Drug E	1	0	0	1

Thank you already in advance!

FreelanceReinh · Posted 03-16-2024 06:54 AM

Hello @Nina4,

For a small input dataset with, say, less than 50,000 observations such as

data have;
input PatientID Drug $ Interaction_partner_1-Interaction_partner_3;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

or

data have;
call streaminit(27182818);
do PatientID=1 to 5000;
  do _n_=1 to rand('integer',15);
    Drug=byte(64+rand('integer',15));
    Interaction_partner_1=rand('bern',0.3);
    Interaction_partner_2=rand('bern',0.3);
    Interaction_partner_3=rand('bern',0.3);
    output;
  end;
end;
run; /* 40380 obs. */

(note that I use numeric interaction partner flags and a character variable Drug, not vice versa)

you could keep the coding simple and use PROC SQL with an EXISTS condition:

proc sql;
create table want as
select a.*, exists(select * from have b
                   where a.PatientID=b.PatientID & a.Drug ne b.Drug
                     & (  a.Interaction_partner_1 & (b.Interaction_partner_2 | b.Interaction_partner_3)
                        | b.Interaction_partner_1 & (a.Interaction_partner_2 | a.Interaction_partner_3)))
            as Interaction
from have a;
quit;

For medium or large datasets, however, a more efficient approach is advisable:

data want(drop=i j);
array i1[199] $200 _temporary_;  /* array for distinct drugs acting as interaction partner 1 */
array i23[199] $200 _temporary_; /* array for distinct drugs acting as interaction partner 2 or 3 */
do _n_=1 by 1 until(last.PatientID);
  set have;
  by PatientID;
  if Interaction_partner_1 & Drug ~in i1 then do;
    i+1;
    i1[i]=Drug;
  end;
  if (Interaction_partner_2 | Interaction_partner_3) & Drug ~in i23 then do;
    j+1;
    i23[j]=Drug;
  end;
end;
do _n_=1 to _n_;
  set have;
  Interaction =  Interaction_partner_1 & (j>1 | j=1 & i23[1] ne Drug)
               | (Interaction_partner_2 | Interaction_partner_3) & (i>1 | i=1 & i1[1] ne Drug);
  output;
end;
call missing (of i1[*], of i23[*], i, j);
run;

(assuming <200 distinct drugs per patient and drug names with <=200 characters).

For the larger HAVE dataset simulated above, this DATA step takes less than one second on my workstation, whereas the PROC SQL step takes about half a minute.

View solution in original post

ballardw · Posted 03-15-2024 11:31 AM

I think you need to go into a little detail about how we recognize an interaction.

Is "interaction_partner_1" supposed to mean Drug A ???

I can't tell where we know what drug might interact with which other or itself.

Nina4 · Posted 03-15-2024 11:37 AM

Thank you for your reply. I'm happy to provide further information.

If a drug is an interaction partner_1 (=1) it can interaction with drugs that are an interaction_partner_2 (=1) or an interaction_partner_3 (=1). But if the drug itself is an interaction_partner_1 and an interaction_partner_2 it cannot interact with itself.

To go into more detail, beeing an interaction_partner_1 means drug A is a substrate, being interaction_partner_2 meaning, it is also an inhibitor. Drug B on the other hand is an inducer (interaction_partner_3).

Does this provide enough information or do you need more details?

ballardw · Posted 03-15-2024 11:45 AM

Way too much jargon. I think we have a case of you are intimately involved with data (not uncommon) and think that your description is exhaustive enough to apply when it is not obvious to others outside of your specialty (also not uncommon).

Work through an example of exactly how Drug A would "interact" with Drug B.

Nina4 · Posted 03-15-2024 11:53 AM

Drug A interacts with Drug B because Drug A is an Interaction_partner_1 and Drug B is an interaction_partner_3. Drug A also interacts with Drug C, because it is an interaction_partner_1 and Drug C is an interaction_partner_2.

Drug A does not interact with Drug D because Drug D is neither an interaction_partner_2 nor an interaction_partner_3.

Drug A does not interact with Drug A, because it is the same Drug.

Drug A would also interact with Drug Z (not in the sample dataset) if drug Z is an interaction_partner_1, because Drug A is an interaction_partner_2.

Drug B and Drug C do not interact, because neither of them is an interaction_partner_1.

Drug B and Drug E interact, because Drug B is an interaction_partner_3 and Drug E is an interaction_partner_1.

Does this help?

FreelanceReinh · Posted 03-16-2024 06:54 AM

Hello @Nina4,

For a small input dataset with, say, less than 50,000 observations such as

data have;
input PatientID Drug $ Interaction_partner_1-Interaction_partner_3;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

or

data have;
call streaminit(27182818);
do PatientID=1 to 5000;
  do _n_=1 to rand('integer',15);
    Drug=byte(64+rand('integer',15));
    Interaction_partner_1=rand('bern',0.3);
    Interaction_partner_2=rand('bern',0.3);
    Interaction_partner_3=rand('bern',0.3);
    output;
  end;
end;
run; /* 40380 obs. */

(note that I use numeric interaction partner flags and a character variable Drug, not vice versa)

you could keep the coding simple and use PROC SQL with an EXISTS condition:

proc sql;
create table want as
select a.*, exists(select * from have b
                   where a.PatientID=b.PatientID & a.Drug ne b.Drug
                     & (  a.Interaction_partner_1 & (b.Interaction_partner_2 | b.Interaction_partner_3)
                        | b.Interaction_partner_1 & (a.Interaction_partner_2 | a.Interaction_partner_3)))
            as Interaction
from have a;
quit;

For medium or large datasets, however, a more efficient approach is advisable:

data want(drop=i j);
array i1[199] $200 _temporary_;  /* array for distinct drugs acting as interaction partner 1 */
array i23[199] $200 _temporary_; /* array for distinct drugs acting as interaction partner 2 or 3 */
do _n_=1 by 1 until(last.PatientID);
  set have;
  by PatientID;
  if Interaction_partner_1 & Drug ~in i1 then do;
    i+1;
    i1[i]=Drug;
  end;
  if (Interaction_partner_2 | Interaction_partner_3) & Drug ~in i23 then do;
    j+1;
    i23[j]=Drug;
  end;
end;
do _n_=1 to _n_;
  set have;
  Interaction =  Interaction_partner_1 & (j>1 | j=1 & i23[1] ne Drug)
               | (Interaction_partner_2 | Interaction_partner_3) & (i>1 | i=1 & i1[1] ne Drug);
  output;
end;
call missing (of i1[*], of i23[*], i, j);
run;

(assuming <200 distinct drugs per patient and drug names with <=200 characters).

For the larger HAVE dataset simulated above, this DATA step takes less than one second on my workstation, whereas the PROC SQL step takes about half a minute.

Nina4 · Posted 03-18-2024 06:21 AM

Thank you! The second option works perfectly with my dataset!

Ksharp · Posted 03-18-2024 03:58 AM

Your question is too ambiguous .
Assuming I understood what you mean.

And assuming there are only five drugs, you can extend it by yourself.


data your_dataset;
input PatientID Drug $ Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 ;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;
proc sql;
create table temp as
select distinct * from your_dataset;
quit;
proc transpose data=temp out=temp2(where=(col1=1));
by PatientID Drug;
var Interaction_partner: ;
run;
data DrugA(rename=(Drug=DrugA)) 
      DrugB(rename=(Drug=DrugB)) 
      DrugC(rename=(Drug=DrugC))  
      DrugD(rename=(Drug=DrugD))  
      DrugE(rename=(Drug=DrugE))  ;
 set temp2;
 select(Drug);
 when('DrugA') output DrugA ;
 when('DrugB') output DrugB ;
 when('DrugC') output DrugC ;
 when('DrugD') output DrugD ;
 when('DrugE') output DrugE ;
 otherwise;
 end;
 keep PatientID Drug;
run;
proc sql;
create table levels as
select PatientID,count(distinct Drug) as levels from temp group by PatientID;


create table x1 as 
select coalesce(DrugA.PatientID,DrugB.PatientID) as PatientID,DrugA,DrugB from DrugA full join DrugB on DrugA.PatientID=DrugB.PatientID;
create table x2 as 
select coalesce(x1.PatientID,DrugC.PatientID) as PatientID,DrugA,DrugB,DrugC from x1 full join DrugC on x1.PatientID=DrugC.PatientID;
create table x3 as 
select coalesce(x2.PatientID,DrugD.PatientID) as PatientID,DrugA,DrugB,DrugC,DrugD from x2 full join DrugD on x2.PatientID=DrugD.PatientID;
create table x4 as 
select coalesce(x3.PatientID,DrugE.PatientID) as PatientID,DrugA,DrugB,DrugC,DrugD,DrugE from x3 full join DrugE on x3.PatientID=DrugE.PatientID;


create table temp3 as
select x4.*,levels from x4 left join levels on x4.PatientID=levels.PatientID;
create table temp4 as
select *,5-cmiss(DrugA,DrugB,DrugC,DrugD,DrugE) as n from temp3;  /*Here 5 stands for 5 Drugs*/

create table k as
select PatientID,max(levels)=max(n) as Interaction from temp4 group by PatientID;
create table want as
select a.*,Interaction from your_dataset as a left join k on a.PatientID=k.PatientID order by PatientID,Drug ;
quit;

Nina4 · Posted 03-18-2024 06:22 AM

Unfortunately this option will not work, as I have an unlimited number of different drugs.

Ksharp · Posted 03-18-2024 05:06 AM

Or Maybe you want this ?


data your_dataset;
input PatientID Drug $ Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 ;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;
proc sql;
create table temp as
select distinct * from your_dataset;

create table n_row as
select PatientID,sum(max(Interaction_partner_1,Interaction_partner_2,Interaction_partner_3)) as n_row  from temp group by PatientID;
create table n_col as
select PatientID,sum(max(Interaction_partner_1),max(Interaction_partner_2),max(Interaction_partner_3)) as n_col  from temp group by PatientID;
create table n_drug as
select PatientID,count(*) as n_drug from temp group by PatientID;
create table k as
select a.*,n_col,n_drug,(n_row=n_drug and n_col=n_col) as want from n_row as a,n_col as b,n_drug as c where a.PatientID=b.PatientID and a.PatientID=c.PatientID;

create table want as
select a.*,want from your_dataset as a left join k on a.PatientID=k.PatientID;
quit;

How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?

Re: How to flag if two variables per patient are met, but not on the same line?