SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Help with coding outcomes multiple date ranges: healthcare data

Reply
New Contributor
Posts: 3

Help with coding outcomes multiple date ranges: healthcare data

Hi, 

 

I have a list of patients and whether and when they filled a diabetes drug, had an hospitalization claim for diabetes, and/or had a outpatient diagnosis for diabetes (DiabetesOutcome=1). Having trouble with multiple dates.

 

ENROLID = person id

 

Drug:

  • DMDrug = Diabetes drug (0= no diabetes drug, 1= filled diabetes drug)
  • DMDrugDate1-DMDrugDate52 = all fill dates for diabetic drug

 

Hospital

  • InptPrimDMDx = Primary hospital diagnosis for diabetes (0= Yes, 1= No)
  • InptSecDMDx = Secondary hospital diagnosis for diabetes (0= Yes, 1= No)
  • InptPrimDMDx_Date1 to InptPrimDMDx_Date9 = all dates of hospitalization with primary diagnosis of diabetes
  • InptSecDMDx_Date1 to InptSecDMDx_Date4 =  all dates of hospitalization with secondary diagnosis of diabetes

Doctor office visits

  • OutptDMDx= Office visit with diabetes diagnosis (0= Yes, 1= No)
  • OutptDMDate_1 to OutptDMDate_38 = all dates for office visit with diabetes diagnosis

My decision rules for actual DiabetesOutcome1: 

  1. If InptPrimDMDx = 1 then DiabetesOutcome = 1

OR

  1. If DMDrug =1 AND If DMDrugDate within +/- 120 days of InptSecDMDx_Date OR within +/- 120 days off OutptDMDate then  DiabetesOutcome = 1;

OR

  1. If InptSecDMDx =1  AND if InptSecDMDx_Date within +/- 120 days of DMDrugDate OR within +/- 120 days of OutptDMDate then  DiabetesOutcome = 1 

OR

  1. If OutptDMDx=1 AND if OutptDMDate within +/- 120 days of DMDrugDate OR within +/- 120 days of InptSecDMDx then  DiabetesOutcome = 1 

Then the first date that drug was filled, or inpatient/outpatient claim would be the date of DiabetesOutcome =1

 

Dataset looks somethin like this:

ENROLID   InptPrimDx InptSecDMDx   InptSecDMDx InptSecDMDxDate1.... InptSecDMDxDate38    DMDrug    DMDrugDate1...52

1                         1             5/1/15                    1                  6/15/15     7/15/15                                                   1                  5/4/15

2                          0               .                           0                

 

Any help is greatly appreciated.

 

Thank you     .                                                                                    

Contributor hbi
Contributor
Posts: 66

Re: Help with coding outcomes multiple date ranges: healthcare data

[ Edited ]

(updated - added sample data and added a possible solution)
 

Let us assume that your dataset structure looks like the one below.
 


DATA sample_dataset;
  ATTRIB  ENROLID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx length=8
          DMDrugDate1       -DMDrugDate52
          InptPrimDMDx_Date1-InptPrimDMDx_Date9
          InptSecDMDx_Date1 -InptSecDMDx_Date4  
          OutptDMDate_1     -OutptDMDate_38 format=date11.;
  CALL MISSING(of _ALL_);
RUN;

Generate some sample data.
 


DATA sample_dataset(drop=i j _Rand: _mod: );
  ATTRIB  ENROLID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx length=8
          DMDrugDate1-DMDrugDate52
          InptPrimDMDx_Date1-InptPrimDMDx_Date9
          InptSecDMDx_Date1-InptSecDMDx_Date4  
          OutptDMDate_1-OutptDMDate_38 format=date11.;
  CALL MISSING(of _ALL_);

  DO i=1 TO 100;
    EnrolID      = MONOTONIC();
    DMDrug       = ROUND(RANUNI(1));
    InptPrimDMDx = ROUND(RANUNI(2));
    InptSecDMDx  = ROUND(RANUNI(3));
    OutptDMDx    = ROUND(RANUNI(4));
    _RandID      = ROUND(RANUNI(5)*100000);
    _mod52       = MOD(_RandID, 52);
    _mod9        = MOD(_RandID, 9);
    _mod4        = MOD(_RandID, 4);
    _mod38       = MOD(_RandID, 38);
    _RandTiny    = ROUND(RANUNI(6)*50);
    _RandSmall   = ROUND(RANUNI(7)*500);
    CALL MISSING(j);

    ARRAY DMDrugDateArray{52} DMDrugDate1-DMDrugDate52;
    CALL MISSING(of DMDrugDateArray{*});
    DO j=1 TO 52;
      IF DMDrug=1 AND j<=_mod52 THEN DO;
        IF j=1 THEN DMDrugDateArray{j} = '15JAN2012'd + _RandSmall;
        ELSE DMDrugDateArray{j} = DMDrugDateArray{j-1} + _RandTiny;
      END;
    END;

    ARRAY hospPrimDateArray{9} InptPrimDMDx_Date1-InptPrimDMDx_Date9;
    CALL MISSING(of hospPrimDateArray{*});
    DO j=1 TO 9;
      IF InptPrimDMDx=1 AND j<=_mod9 THEN DO;
        IF j=1 THEN hospPrimDateArray{j} = DMDrugDate1 - ROUND(RANUNI(8)*200);
        ELSE hospPrimDateArray{j} = hospPrimDateArray{j-1} + ROUND(RANUNI(9)*200);
      END;
    END;

    ARRAY hospSecDateArray{4} InptSecDMDx_Date1-InptSecDMDx_Date4;
    CALL MISSING(of hospSecDateArray{*});
    DO j=1 TO 4;
      IF InptSecDMDx=1 AND j<=_mod4 THEN DO;
        IF j=1 THEN hospSecDateArray{j} = DMDrugDate1 - ROUND(RANUNI(10)*200);
        ELSE hospSecDateArray{j} = hospSecDateArray{j-1} + ROUND(RANUNI(11)*200);
      END;
    END;

    ARRAY OutptDMDateArray{38} OutptDMDate_1-OutptDMDate_38;
    CALL MISSING(of OutptDMDateArray{*});
    DO j=1 TO 38;
      IF OutptDMDx=1 AND j<=_mod38 THEN DO;
        IF j=1 THEN OutptDMDateArray{j} = DMDrugDate1 - ROUND(RANUNI(12)*100);
        ELSE OutptDMDateArray{j} = OutptDMDateArray{j-1} + ROUND(RANUNI(13)*30);
      END;
    END;

    OUTPUT;
  END;

RUN;

You should transform the dataset into several tall datasets. One you do this, all kinds of manipulation become possible:
 


/* patient-level boolean variables */
DATA patient;
    SET sample_dataset(KEEP=EnrolID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx);
RUN;


/* a given patient has between 1 and 52 drug administration dates */
DATA patient_drug(KEEP=EnrolID DMDrug i DMDrugDate);
    SET sample_dataset(KEEP=EnrolID DMDrug DMDrugDate: );
    ARRAY DMDrugDateArray[52] DMDrugDate1-DMDrugDate52;
    FORMAT DMDrugDate DATE11.;
    DO i=1 TO 52;
     DMDrugDate = DMDrugDateArray[i];
     IF DMDrugDate NE . THEN OUTPUT;
    END;
RUN;


/* a given patient has between 1 and 9 hospitalizations (primary) */
DATA patient_hospital_prim(KEEP=EnrolID InptPrimDMDx i InptPrimDMDxDate);
    SET sample_dataset(KEEP=EnrolID InptPrimDMDx InptPrimDMDx_Date: );
    ARRAY hospPrimDateArray[9] InptPrimDMDx_Date1-InptPrimDMDx_Date9;
    FORMAT InptPrimDMDxDate DATE11.;
    DO i=1 TO 9;
     InptPrimDMDxDate = hospPrimDateArray[i];
     IF InptPrimDMDxDate NE . THEN OUTPUT;
    END;
RUN;


/* a given patient has between 1 and 4 hospitalizations (secondary) */
DATA patient_hospital_sec(KEEP=EnrolID InptSecDMDx i InptSecDMDxDate);
    SET sample_dataset(KEEP=EnrolID InptSecDMDx InptSecDMDx_Date: );
    ARRAY hospSecDateArray[4] InptSecDMDx_Date1-InptSecDMDx_Date4;
    FORMAT InptSecDMDxDate DATE11.;
    DO i=1 TO 4;
     InptSecDMDxDate = hospSecDateArray[i];
     IF InptSecDMDxDate NE . THEN OUTPUT;
    END;
RUN;


/* a given patient has between 1 and 38 office visit dates */
DATA patient_office_visit(KEEP=EnrolID OutptDMDx i OutptDMDate);
    SET sample_dataset(KEEP=EnrolID OutptDMDx OutptDMDate_: );
    ARRAY OutptDMDateArray[38] OutptDMDate_1-OutptDMDate_38;
    FORMAT OutptDMDate DATE11.;
    DO i=1 TO 38;
     OutptDMDate = OutptDMDateArray[i];
     IF OutptDMDate NE . THEN OUTPUT;
    END;
RUN;

I did outcomes A and B for you. You can tackle outcomes C and D in much the same manner.
 


/*
=== outcome A === 
If InptPrimDMDx = 1 then DiabetesOutcome = 1

OR
=== outcome B === 
If (DMDrug =1 AND If DMDrugDate within +/- 120 days of InptSecDMDx_Date 
                     OR within +/- 120 days of OutptDMDate 
    then  DiabetesOutcome = 1);

OR
=== outcome C === 
If (InptSecDMDx =1  AND if InptSecDMDx_Date within +/- 120 days of DMDrugDate 
    OR within +/- 120 days of OutptDMDate then  DiabetesOutcome = 1);

OR
=== outcome D === 
If (OutptDMDx=1 AND if OutptDMDate within +/- 120 days of DMDrugDate 
    OR within +/- 120 days of InptSecDMDx then  DiabetesOutcome = 1)

Then the first date that drug was filled, or inpatient/outpatient claim would 
be the date of DiabetesOutcome = 1
*/

PROC SQL;
  CREATE TABLE outcome_a AS 
  SELECT patient.*
       , 1 AS DiabetesOutcome
  FROM patient
  WHERE InptPrimDMDx = 1;
QUIT;


PROC SQL;
  CREATE TABLE outcome_b AS 
  SELECT AA.ENROLID
       , AA.DMDrug
       , AA.InptPrimDMDx
       , AA.InptSecDMDx
       , AA.OutptDMDx
       , BB.DMDrugDate
       , MIN(ABS(BB.DMDrugDate - CC.InptSecDMDxDate)) AS DiffDays1
       , MIN(ABS(BB.DMDrugDate - DD.OutptDMDate))     AS DiffDays2
       , 1 AS DiabetesOutcome
  FROM patient as AA
  INNER JOIN patient_drug as BB
    ON BB.EnrolID = AA.EnrolID
  LEFT JOIN patient_hospital_sec as CC
    ON (CC.EnrolID = AA.EnrolID
        AND ABS(BB.DMDrugDate - CC.InptSecDMDxDate) <= 120)
  LEFT JOIN patient_office_visit as DD
    ON (CC.EnrolID = AA.EnrolID
        AND ABS(BB.DMDrugDate - DD.OutptDMDate) <= 120)
  WHERE AA.DMDrug = 1
    AND AA.InptPrimDMDx = 0
    AND NMISS(CC.InptSecDMDxDate, DD.OutptDMDate) = 0
    AND (ABS(BB.DMDrugDate - CC.InptSecDMDxDate) <= 120
          OR ABS(BB.DMDrugDate - DD.OutptDMDate) <= 120)
  GROUP BY AA.ENROLID, AA.DMDrug, AA.InptPrimDMDx
       , AA.InptSecDMDx, AA.OutptDMDx, BB.DMDrugDate
  ORDER BY AA.EnrolID, DMDrugDate, MIN(DiffDays1, DiffDays2);
QUIT;

/* eliminate dupes as the earliest date should suffice */
PROC SORT DATA=outcome_b nodupkey DUPOUT=outcome_b_dupes; 
  BY ENROLID; 
RUN;
Super User
Posts: 11,343

Re: Help with coding outcomes multiple date ranges: healthcare data

First question: Are your date variables SAS date valued varaibles or some thing else?

Second: within the groups of date values are they in calendar order?

Third: You set a flag for there was an outcome but don't indicate that you want the data captured, is that correct? Sounds unlikely to me.

 

You'll likely be using arrays in a data step.

 

However english is a bit fuzzy about how to exactly involve OR, inclusive or exclusive and whether to distribute comparisons.

For instance for

If DMDrug =1 AND If DMDrugDate within +/- 120 days of InptSecDMDx_Date OR within +/- 120 days off OutptDMDate then  DiabetesOutcome = 1;

Do you do the comparison for OutptDMDate only if DMDrug=1? Also is it preferable to compare all of the DMDrugDate with one value of InptSecDMDx_Date, one value of DMDrugDate with each value of InptSecDMDx_Date before considering the next DMDrugDate, and some similar order of comparisons with the OutptDMDate choices? Is ANY value in that 120 days exceptable? Or are you looking for earliest calendar date, closest to diagnosis date?

Ask a Question
Discussion stats
  • 2 replies
  • 332 views
  • 0 likes
  • 3 in conversation