BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nina4
Obsidian | Level 7

Hello everyone 

 

I have a dataset with drug claims and would like to flag patient, that have drug interactions. The dataset looks like that: 

data your_dataset;
input PatientID Drug Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 $;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

a drug cannot make an interaction with itself. Therefore I would like to flag all lines of a patient that has an interaction between two drugs. How can I do that? 

 

The answer should look like this: 

 

PatientIDDrugInteraction_partner_1 Interaction_partner_2Interaction_partner_3Interaction
1Drug A1101
1Drug B0011
1Drug C0101
2Drug A1100
2Drug D0000
2Drug A1100
3Drug B0011
3Drug E1001

 

Thank you already in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Nina4,

 

For a small input dataset with, say, less than 50,000 observations such as

data have;
input PatientID Drug $ Interaction_partner_1-Interaction_partner_3;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

or

data have;
call streaminit(27182818);
do PatientID=1 to 5000;
  do _n_=1 to rand('integer',15);
    Drug=byte(64+rand('integer',15));
    Interaction_partner_1=rand('bern',0.3);
    Interaction_partner_2=rand('bern',0.3);
    Interaction_partner_3=rand('bern',0.3);
    output;
  end;
end;
run; /* 40380 obs. */

(note that I use numeric interaction partner flags and a character variable Drug, not vice versa)

you could keep the coding simple and use PROC SQL with an EXISTS condition:

proc sql;
create table want as
select a.*, exists(select * from have b
                   where a.PatientID=b.PatientID & a.Drug ne b.Drug
                     & (  a.Interaction_partner_1 & (b.Interaction_partner_2 | b.Interaction_partner_3)
                        | b.Interaction_partner_1 & (a.Interaction_partner_2 | a.Interaction_partner_3)))
            as Interaction
from have a;
quit;

For medium or large datasets, however, a more efficient approach is advisable:

data want(drop=i j);
array i1[199] $200 _temporary_;  /* array for distinct drugs acting as interaction partner 1 */
array i23[199] $200 _temporary_; /* array for distinct drugs acting as interaction partner 2 or 3 */
do _n_=1 by 1 until(last.PatientID);
  set have;
  by PatientID;
  if Interaction_partner_1 & Drug ~in i1 then do;
    i+1;
    i1[i]=Drug;
  end;
  if (Interaction_partner_2 | Interaction_partner_3) & Drug ~in i23 then do;
    j+1;
    i23[j]=Drug;
  end;
end;
do _n_=1 to _n_;
  set have;
  Interaction =  Interaction_partner_1 & (j>1 | j=1 & i23[1] ne Drug)
               | (Interaction_partner_2 | Interaction_partner_3) & (i>1 | i=1 & i1[1] ne Drug);
  output;
end;
call missing (of i1[*], of i23[*], i, j);
run;

(assuming <200 distinct drugs per patient and drug names with <=200 characters).

 

For the larger HAVE dataset simulated above, this DATA step takes less than one second on my workstation, whereas the PROC SQL step takes about half a minute.

View solution in original post

9 REPLIES 9
ballardw
Super User

I think you need to go into a little detail about how we recognize an interaction.

Is "interaction_partner_1" supposed to mean Drug A ???

I can't tell where we know what drug might interact with which other or itself.

 

Nina4
Obsidian | Level 7

Thank you for your reply. I'm happy to provide further information. 

 

If a drug is an interaction partner_1 (=1) it can interaction with drugs that are an interaction_partner_2 (=1) or an interaction_partner_3 (=1). But if the drug itself is an interaction_partner_1 and an interaction_partner_2 it cannot interact with itself. 

 

To go into more detail, beeing an interaction_partner_1 means drug A is a substrate, being interaction_partner_2 meaning, it is also an inhibitor.  Drug B on the other hand is an inducer (interaction_partner_3). 

 

Does this provide enough information or do you need more details?

ballardw
Super User

Way too much jargon. I think we have a case of you are intimately involved with data (not uncommon) and think that your description is exhaustive enough to apply when it is not obvious to others outside of your specialty (also not uncommon).

 

Work through an example of exactly how Drug A would "interact" with Drug B.

Nina4
Obsidian | Level 7

Drug A interacts with Drug B because Drug A is an Interaction_partner_1 and Drug B is an interaction_partner_3. Drug A also interacts with Drug C, because it is an interaction_partner_1 and Drug C is an interaction_partner_2. 

Drug A does not interact with Drug D because Drug D is neither an interaction_partner_2 nor an interaction_partner_3. 

Drug A does not interact with Drug A, because it is the same Drug. 

Drug A would also interact with Drug Z (not in the sample dataset) if drug Z is an interaction_partner_1, because Drug A is an interaction_partner_2. 

Drug B and Drug C do not interact, because neither of them is an interaction_partner_1. 

Drug B and Drug E interact, because Drug B is an interaction_partner_3 and Drug E is an interaction_partner_1.

 

Does this help?

FreelanceReinh
Jade | Level 19

Hello @Nina4,

 

For a small input dataset with, say, less than 50,000 observations such as

data have;
input PatientID Drug $ Interaction_partner_1-Interaction_partner_3;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;

or

data have;
call streaminit(27182818);
do PatientID=1 to 5000;
  do _n_=1 to rand('integer',15);
    Drug=byte(64+rand('integer',15));
    Interaction_partner_1=rand('bern',0.3);
    Interaction_partner_2=rand('bern',0.3);
    Interaction_partner_3=rand('bern',0.3);
    output;
  end;
end;
run; /* 40380 obs. */

(note that I use numeric interaction partner flags and a character variable Drug, not vice versa)

you could keep the coding simple and use PROC SQL with an EXISTS condition:

proc sql;
create table want as
select a.*, exists(select * from have b
                   where a.PatientID=b.PatientID & a.Drug ne b.Drug
                     & (  a.Interaction_partner_1 & (b.Interaction_partner_2 | b.Interaction_partner_3)
                        | b.Interaction_partner_1 & (a.Interaction_partner_2 | a.Interaction_partner_3)))
            as Interaction
from have a;
quit;

For medium or large datasets, however, a more efficient approach is advisable:

data want(drop=i j);
array i1[199] $200 _temporary_;  /* array for distinct drugs acting as interaction partner 1 */
array i23[199] $200 _temporary_; /* array for distinct drugs acting as interaction partner 2 or 3 */
do _n_=1 by 1 until(last.PatientID);
  set have;
  by PatientID;
  if Interaction_partner_1 & Drug ~in i1 then do;
    i+1;
    i1[i]=Drug;
  end;
  if (Interaction_partner_2 | Interaction_partner_3) & Drug ~in i23 then do;
    j+1;
    i23[j]=Drug;
  end;
end;
do _n_=1 to _n_;
  set have;
  Interaction =  Interaction_partner_1 & (j>1 | j=1 & i23[1] ne Drug)
               | (Interaction_partner_2 | Interaction_partner_3) & (i>1 | i=1 & i1[1] ne Drug);
  output;
end;
call missing (of i1[*], of i23[*], i, j);
run;

(assuming <200 distinct drugs per patient and drug names with <=200 characters).

 

For the larger HAVE dataset simulated above, this DATA step takes less than one second on my workstation, whereas the PROC SQL step takes about half a minute.

Nina4
Obsidian | Level 7
Thank you! The second option works perfectly with my dataset!
Ksharp
Super User

Your question is too ambiguous .
Assuming I understood what you mean.

And assuming there are only five drugs, you can extend it by yourself.

 


data your_dataset;
input PatientID Drug $ Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 ;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;
proc sql;
create table temp as
select distinct * from your_dataset;
quit;
proc transpose data=temp out=temp2(where=(col1=1));
by PatientID Drug;
var Interaction_partner: ;
run;
data DrugA(rename=(Drug=DrugA)) 
      DrugB(rename=(Drug=DrugB)) 
      DrugC(rename=(Drug=DrugC))  
      DrugD(rename=(Drug=DrugD))  
      DrugE(rename=(Drug=DrugE))  ;
 set temp2;
 select(Drug);
 when('DrugA') output DrugA ;
 when('DrugB') output DrugB ;
 when('DrugC') output DrugC ;
 when('DrugD') output DrugD ;
 when('DrugE') output DrugE ;
 otherwise;
 end;
 keep PatientID Drug;
run;
proc sql;
create table levels as
select PatientID,count(distinct Drug) as levels from temp group by PatientID;


create table x1 as 
select coalesce(DrugA.PatientID,DrugB.PatientID) as PatientID,DrugA,DrugB from DrugA full join DrugB on DrugA.PatientID=DrugB.PatientID;
create table x2 as 
select coalesce(x1.PatientID,DrugC.PatientID) as PatientID,DrugA,DrugB,DrugC from x1 full join DrugC on x1.PatientID=DrugC.PatientID;
create table x3 as 
select coalesce(x2.PatientID,DrugD.PatientID) as PatientID,DrugA,DrugB,DrugC,DrugD from x2 full join DrugD on x2.PatientID=DrugD.PatientID;
create table x4 as 
select coalesce(x3.PatientID,DrugE.PatientID) as PatientID,DrugA,DrugB,DrugC,DrugD,DrugE from x3 full join DrugE on x3.PatientID=DrugE.PatientID;


create table temp3 as
select x4.*,levels from x4 left join levels on x4.PatientID=levels.PatientID;
create table temp4 as
select *,5-cmiss(DrugA,DrugB,DrugC,DrugD,DrugE) as n from temp3;  /*Here 5 stands for 5 Drugs*/

create table k as
select PatientID,max(levels)=max(n) as Interaction from temp4 group by PatientID;
create table want as
select a.*,Interaction from your_dataset as a left join k on a.PatientID=k.PatientID order by PatientID,Drug ;
quit;

 

Nina4
Obsidian | Level 7
Unfortunately this option will not work, as I have an unlimited number of different drugs.
Ksharp
Super User

Or Maybe you want this ?

 


data your_dataset;
input PatientID Drug $ Interaction_partner_1 Interaction_partner_2 Interaction_partner_3 ;
datalines;
1 DrugA 1 1 0
1 DrugB 0 0 1
1 DrugC 0 1 0
2 DrugA 1 1 0
2 DrugD 0 0 0
2 DrugA 1 1 0
3 DrugB 0 0 1
3 DrugE 1 0 0
;
proc sql;
create table temp as
select distinct * from your_dataset;

create table n_row as
select PatientID,sum(max(Interaction_partner_1,Interaction_partner_2,Interaction_partner_3)) as n_row  from temp group by PatientID;
create table n_col as
select PatientID,sum(max(Interaction_partner_1),max(Interaction_partner_2),max(Interaction_partner_3)) as n_col  from temp group by PatientID;
create table n_drug as
select PatientID,count(*) as n_drug from temp group by PatientID;
create table k as
select a.*,n_col,n_drug,(n_row=n_drug and n_col=n_col) as want from n_row as a,n_col as b,n_drug as c where a.PatientID=b.PatientID and a.PatientID=c.PatientID;

create table want as
select a.*,want from your_dataset as a left join k on a.PatientID=k.PatientID;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 753 views
  • 1 like
  • 4 in conversation