Hi,
I have a list of patients and whether and when they filled a diabetes drug, had an hospitalization claim for diabetes, and/or had a outpatient diagnosis for diabetes (DiabetesOutcome=1). Having trouble with multiple dates.
ENROLID = person id
Drug:
Hospital
Doctor office visits
My decision rules for actual DiabetesOutcome1:
OR
OR
OR
Then the first date that drug was filled, or inpatient/outpatient claim would be the date of DiabetesOutcome =1
Dataset looks somethin like this:
ENROLID InptPrimDx InptSecDMDx InptSecDMDx InptSecDMDxDate1.... InptSecDMDxDate38 DMDrug DMDrugDate1...52
1 1 5/1/15 1 6/15/15 7/15/15 1 5/4/15
2 0 . 0
Any help is greatly appreciated.
Thank you .
(updated - added sample data and added a possible solution)
Let us assume that your dataset structure looks like the one below.
DATA sample_dataset;
ATTRIB ENROLID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx length=8
DMDrugDate1 -DMDrugDate52
InptPrimDMDx_Date1-InptPrimDMDx_Date9
InptSecDMDx_Date1 -InptSecDMDx_Date4
OutptDMDate_1 -OutptDMDate_38 format=date11.;
CALL MISSING(of _ALL_);
RUN;
Generate some sample data.
DATA sample_dataset(drop=i j _Rand: _mod: );
ATTRIB ENROLID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx length=8
DMDrugDate1-DMDrugDate52
InptPrimDMDx_Date1-InptPrimDMDx_Date9
InptSecDMDx_Date1-InptSecDMDx_Date4
OutptDMDate_1-OutptDMDate_38 format=date11.;
CALL MISSING(of _ALL_);
DO i=1 TO 100;
EnrolID = MONOTONIC();
DMDrug = ROUND(RANUNI(1));
InptPrimDMDx = ROUND(RANUNI(2));
InptSecDMDx = ROUND(RANUNI(3));
OutptDMDx = ROUND(RANUNI(4));
_RandID = ROUND(RANUNI(5)*100000);
_mod52 = MOD(_RandID, 52);
_mod9 = MOD(_RandID, 9);
_mod4 = MOD(_RandID, 4);
_mod38 = MOD(_RandID, 38);
_RandTiny = ROUND(RANUNI(6)*50);
_RandSmall = ROUND(RANUNI(7)*500);
CALL MISSING(j);
ARRAY DMDrugDateArray{52} DMDrugDate1-DMDrugDate52;
CALL MISSING(of DMDrugDateArray{*});
DO j=1 TO 52;
IF DMDrug=1 AND j<=_mod52 THEN DO;
IF j=1 THEN DMDrugDateArray{j} = '15JAN2012'd + _RandSmall;
ELSE DMDrugDateArray{j} = DMDrugDateArray{j-1} + _RandTiny;
END;
END;
ARRAY hospPrimDateArray{9} InptPrimDMDx_Date1-InptPrimDMDx_Date9;
CALL MISSING(of hospPrimDateArray{*});
DO j=1 TO 9;
IF InptPrimDMDx=1 AND j<=_mod9 THEN DO;
IF j=1 THEN hospPrimDateArray{j} = DMDrugDate1 - ROUND(RANUNI(8)*200);
ELSE hospPrimDateArray{j} = hospPrimDateArray{j-1} + ROUND(RANUNI(9)*200);
END;
END;
ARRAY hospSecDateArray{4} InptSecDMDx_Date1-InptSecDMDx_Date4;
CALL MISSING(of hospSecDateArray{*});
DO j=1 TO 4;
IF InptSecDMDx=1 AND j<=_mod4 THEN DO;
IF j=1 THEN hospSecDateArray{j} = DMDrugDate1 - ROUND(RANUNI(10)*200);
ELSE hospSecDateArray{j} = hospSecDateArray{j-1} + ROUND(RANUNI(11)*200);
END;
END;
ARRAY OutptDMDateArray{38} OutptDMDate_1-OutptDMDate_38;
CALL MISSING(of OutptDMDateArray{*});
DO j=1 TO 38;
IF OutptDMDx=1 AND j<=_mod38 THEN DO;
IF j=1 THEN OutptDMDateArray{j} = DMDrugDate1 - ROUND(RANUNI(12)*100);
ELSE OutptDMDateArray{j} = OutptDMDateArray{j-1} + ROUND(RANUNI(13)*30);
END;
END;
OUTPUT;
END;
RUN;
You should transform the dataset into several tall datasets. One you do this, all kinds of manipulation become possible:
/* patient-level boolean variables */
DATA patient;
SET sample_dataset(KEEP=EnrolID DMDrug InptPrimDMDx InptSecDMDx OutptDMDx);
RUN;
/* a given patient has between 1 and 52 drug administration dates */
DATA patient_drug(KEEP=EnrolID DMDrug i DMDrugDate);
SET sample_dataset(KEEP=EnrolID DMDrug DMDrugDate: );
ARRAY DMDrugDateArray[52] DMDrugDate1-DMDrugDate52;
FORMAT DMDrugDate DATE11.;
DO i=1 TO 52;
DMDrugDate = DMDrugDateArray[i];
IF DMDrugDate NE . THEN OUTPUT;
END;
RUN;
/* a given patient has between 1 and 9 hospitalizations (primary) */
DATA patient_hospital_prim(KEEP=EnrolID InptPrimDMDx i InptPrimDMDxDate);
SET sample_dataset(KEEP=EnrolID InptPrimDMDx InptPrimDMDx_Date: );
ARRAY hospPrimDateArray[9] InptPrimDMDx_Date1-InptPrimDMDx_Date9;
FORMAT InptPrimDMDxDate DATE11.;
DO i=1 TO 9;
InptPrimDMDxDate = hospPrimDateArray[i];
IF InptPrimDMDxDate NE . THEN OUTPUT;
END;
RUN;
/* a given patient has between 1 and 4 hospitalizations (secondary) */
DATA patient_hospital_sec(KEEP=EnrolID InptSecDMDx i InptSecDMDxDate);
SET sample_dataset(KEEP=EnrolID InptSecDMDx InptSecDMDx_Date: );
ARRAY hospSecDateArray[4] InptSecDMDx_Date1-InptSecDMDx_Date4;
FORMAT InptSecDMDxDate DATE11.;
DO i=1 TO 4;
InptSecDMDxDate = hospSecDateArray[i];
IF InptSecDMDxDate NE . THEN OUTPUT;
END;
RUN;
/* a given patient has between 1 and 38 office visit dates */
DATA patient_office_visit(KEEP=EnrolID OutptDMDx i OutptDMDate);
SET sample_dataset(KEEP=EnrolID OutptDMDx OutptDMDate_: );
ARRAY OutptDMDateArray[38] OutptDMDate_1-OutptDMDate_38;
FORMAT OutptDMDate DATE11.;
DO i=1 TO 38;
OutptDMDate = OutptDMDateArray[i];
IF OutptDMDate NE . THEN OUTPUT;
END;
RUN;
I did outcomes A and B for you. You can tackle outcomes C and D in much the same manner.
/*
=== outcome A ===
If InptPrimDMDx = 1 then DiabetesOutcome = 1
OR
=== outcome B ===
If (DMDrug =1 AND If DMDrugDate within +/- 120 days of InptSecDMDx_Date
OR within +/- 120 days of OutptDMDate
then DiabetesOutcome = 1);
OR
=== outcome C ===
If (InptSecDMDx =1 AND if InptSecDMDx_Date within +/- 120 days of DMDrugDate
OR within +/- 120 days of OutptDMDate then DiabetesOutcome = 1);
OR
=== outcome D ===
If (OutptDMDx=1 AND if OutptDMDate within +/- 120 days of DMDrugDate
OR within +/- 120 days of InptSecDMDx then DiabetesOutcome = 1)
Then the first date that drug was filled, or inpatient/outpatient claim would
be the date of DiabetesOutcome = 1
*/
PROC SQL;
CREATE TABLE outcome_a AS
SELECT patient.*
, 1 AS DiabetesOutcome
FROM patient
WHERE InptPrimDMDx = 1;
QUIT;
PROC SQL;
CREATE TABLE outcome_b AS
SELECT AA.ENROLID
, AA.DMDrug
, AA.InptPrimDMDx
, AA.InptSecDMDx
, AA.OutptDMDx
, BB.DMDrugDate
, MIN(ABS(BB.DMDrugDate - CC.InptSecDMDxDate)) AS DiffDays1
, MIN(ABS(BB.DMDrugDate - DD.OutptDMDate)) AS DiffDays2
, 1 AS DiabetesOutcome
FROM patient as AA
INNER JOIN patient_drug as BB
ON BB.EnrolID = AA.EnrolID
LEFT JOIN patient_hospital_sec as CC
ON (CC.EnrolID = AA.EnrolID
AND ABS(BB.DMDrugDate - CC.InptSecDMDxDate) <= 120)
LEFT JOIN patient_office_visit as DD
ON (CC.EnrolID = AA.EnrolID
AND ABS(BB.DMDrugDate - DD.OutptDMDate) <= 120)
WHERE AA.DMDrug = 1
AND AA.InptPrimDMDx = 0
AND NMISS(CC.InptSecDMDxDate, DD.OutptDMDate) = 0
AND (ABS(BB.DMDrugDate - CC.InptSecDMDxDate) <= 120
OR ABS(BB.DMDrugDate - DD.OutptDMDate) <= 120)
GROUP BY AA.ENROLID, AA.DMDrug, AA.InptPrimDMDx
, AA.InptSecDMDx, AA.OutptDMDx, BB.DMDrugDate
ORDER BY AA.EnrolID, DMDrugDate, MIN(DiffDays1, DiffDays2);
QUIT;
/* eliminate dupes as the earliest date should suffice */
PROC SORT DATA=outcome_b nodupkey DUPOUT=outcome_b_dupes;
BY ENROLID;
RUN;
First question: Are your date variables SAS date valued varaibles or some thing else?
Second: within the groups of date values are they in calendar order?
Third: You set a flag for there was an outcome but don't indicate that you want the data captured, is that correct? Sounds unlikely to me.
You'll likely be using arrays in a data step.
However english is a bit fuzzy about how to exactly involve OR, inclusive or exclusive and whether to distribute comparisons.
For instance for
If DMDrug =1 AND If DMDrugDate within +/- 120 days of InptSecDMDx_Date OR within +/- 120 days off OutptDMDate then DiabetesOutcome = 1;
Do you do the comparison for OutptDMDate only if DMDrug=1? Also is it preferable to compare all of the DMDrugDate with one value of InptSecDMDx_Date, one value of DMDrugDate with each value of InptSecDMDx_Date before considering the next DMDrugDate, and some similar order of comparisons with the OutptDMDate choices? Is ANY value in that 120 days exceptable? Or are you looking for earliest calendar date, closest to diagnosis date?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.