Hello!
I would like to make sure my SAS code is correct in identifying only the first occurrence of either of two outcomes, dka or hhs (meaning the first encounter where either outcome occurs). I would like to also identify the first composite outcomes (where both outcomes occur in the same encounter). Outcomes will be analyzed separately in a case control study.
I included mock datasets samples for your convenience (MALIB.DKA_HSS_SORTA6 and malib.DKA_HHS_first) and I indicated this clearly in the attached SAS code. Thanks!
Columns in the input dataset which only has dka and/or hhs cases based on diagnosisIDs:
UniqueID: unique identifier specific to the patient.
encounterID: hospital visit associated with the patient. Each patient can have multiple encounters throughout his existence in the cohort as they were admitted/discharged multiple times.
dka_event: binary 0 or 1, event occurred or not for the patient in a specific encounter.
hhs_event: binary 0 or 1, event occurred or not for the patient in a specific encounter.
Discharged_date: The date the patient was discharged for that encounter.
DiagnosisID: diagnosis_ids for either of the two outcomes: dka or hhs. There are multiple ids for each outcome and it is possible one encounter lists multiple ids. The IDs for either outcome are different.
Alldischarge: formatted dischaged_date.
Output dataset:
Same as above and these two addiitonal columns:
Dka_date: based on alldischarged date which is the date of first occurrence
HHS_date: based on alldischarged date which is the date of first occurrence
No errors in code when I ran it each step or sql.
In the final datasets, there are duplicateIDs, but different encountersIDs for each patient.
Thanks so much for all your help!
Nisrine
/*Acquire diagnosisIDS */
proc sql;
create table malib.dkahhscases as
select encounterID,
diagnosisID
from work.f_diagnosis
where diagnosisID in (/*DKA*/ 5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201,
147333, 140368, 16755 , 16756,
/*HHS event*/ 102941, 556, 5507, 5508 , 140966, 16758, 16757);
quit;
/*Create binary variables 0 and 1 for outcomes of interest*/
proc sql;
create table malib.DKA_HHS as
select *,
/*DKA event*/
(case when diagnosisID in (5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201,
147333, 140368, 16755 , 16756 ) then 1 else 0 end) as DKA_event,
/*HHS event*/
(case when diagnosisID in (102941, 556, 5507, 5508 , 140966, 16758, 16757 ) then 1 else 0 end) as HHS_event
from malib.dkahhscases;
quit;
/*check if duplicate encounters but don’t use- there are duplicate encounters but don’t use the deduped dataset*/
proc sort data = malib.dka_hhs out =duplicate1 nodupkey;
by encounterID;
run;
/*sort by encounterID*/
proc sort data = malib.dka_hhs out = malib.dka_hhs_S;
by encounterID;
run;
/*Collapse Rows to Get Unique Encounters - Keep Max Value for diagnosisID—because one encounter may have multiple diagnosisIDs for either outcome and I don’t want to loose any cases*/
proc sql;
create table malib.dka_hhs_C as
select encounterID, diagnosisID,
max(dka_event) as dka_event,
max(hhs_event) as hhs_event
from malib.dka_hhs_S
group by encounterID;
quit; /* grouped by */
/*this is a final table of only DKA and HHS events in the full database - when encounterID is not deduped*/
title 'proc freq - unique encounters with diagnosisIDs of ineterst';
proc freq data=malib.dka_hhs_C;
table DKA_event hhs_event;
run; /*DKA and HHS frequencies are higher if I don’t dedupe by encounter_ids*/
/*Dedupe by encounterID now that I collapsed*/
proc sort data = malib.dka_hhs_C out = malib.dka_hhs_CD nodupkey;
by encounterID;
run;
/*this is a table of only DKA and HHS events in full dataset*/
title 'proc freq - unique encounters with diagnosisIDs of ineterst if encounterID is deduped';
proc freq data=malib.dka_hhs_CD;
table DKA_event hhs_event;
run; /**/
/*USE malib.dka_hhs_CD*/
/*Inner Join Acquire Encounters for Adult Inpatients with
DKA or HHS*/
/*link to encounter_patient to get the dischaged_date*/
proc sql;
create table malib.dka_hss_sortA2 as
select a.*, b.dka_event, b.hhs_event, b.diagnosisID
from nhlib.enc_patient as a inner join malib.dka_hhs_CD as b
on a.encounterID =b.encounterID;
quit; /*74710 rows and 6 columns.
*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey;
by encounterID;
run; /*zero duplicate values so deleted malib.dka_hss_sortA3 - it was a quality check only*/
*In full dataset but only with a dataset with DKA and HHS;
Title 'DKA frequency in full dataset inpatients';
proc freq data = malib.dka_hss_sortA2;
table dka_event;
run; *;
proc freq data = malib.dka_hss_sortA2;
table HHS_event;
run; *;
Title 'UniqueID - malib.dka_hss_sortA2';
proc sql;
select count(distinct UniqueID) from malib.dka_hss_sortA2;
quit; *;
Title 'encounter sk - malib.dka_hss_sortA2';
proc sql;
select count(distinct encounterID) from malib.dka_hss_sortA2;
quit; /**/
/* Are there missing UniqueIDs*/
data malib.dka_hss_sortA2B;
set malib.dka_hss_sortA2;
if UniqueID = . then output;
run; /*Some missing Unique IDs to delete later */
/*Testing to see if there are duplicate sks and there are, but no duplicate encounterIDs*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey;
by UniqueID;
run; /*don't use , only for checking purposes - dataset deleted*/
/*run without deduping by UniqueID*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3;
by UniqueID;
run; /*one patient with different encounters but seems to be the same diagnosisID-- I collapsed above*/
data malib.dka_hss_sortA4;
set malib.dka_hss_sortA3;
if UniqueID = . then delete;
run;
/*deleted missing UniqueIDs*/
/*Acquire first occurrence*/
/* Sort data by UniqueID and discharged_date */
proc sort data=MALIB.DKA_HSS_SORTA4;
by UniqueID dischaged_date;
run; /*74325 observations and 6 variables*/
/*Reorder Varariables*/
proc sql;
create table MALIB.DKA_HSS_SORTA5 as
select UniqueID, encounterID, dischaged_date, dka_event,
hhs_event , diagnosisID
from MALIB.DKA_HSS_SORTA4;
quit; /*
*/
/*sort again*/
proc sort data = MALIB.DKA_HSS_SORTA5;
by UniqueID dka_event dischaged_date ;
run;
/*INPUT DATASET – included a sample*/
/*all discharge is the formatted dischaged_date */
data MALIB.DKA_HSS_SORTA6;
set MALIB.DKA_HSS_SORTA5;
alldischarge = datepart(dischaged_date);
format alldischarge YYMMDD8.;
run;
/*Output DATASET – included a sample*/
/*Get first occurrence */
data malib.DKA_HHS_first;
set MALIB.DKA_HSS_SORTA6;
by UniqueID;
if first.UniqueID and dka_event=1 then DKA_date = alldischarge;
if first.UniqueID and hhs_event=1 then hhs_date = alldischarge;
run; *it worked - 74325 observations and 7 variables;
/*Is the above code for first occurrence, correct? */
/*how can I only keep the first occurrence?*/
/*Composite Outcome*/
data work.composite;
set MALIB.DKA_HSS_SORTA6;
by UniqueID;
if dka_event = 1 and hhs_event = 1 then output;
run;
/*How can I get the composite outcome based on first occurrence?*/
... View more