Hello!
I would like to make sure my SAS code is correct in identifying only the first occurrence of either of two outcomes, dka or hhs (meaning the first encounter where either outcome occurs). I would like to also identify the first composite outcomes (where both outcomes occur in the same encounter). Outcomes will be analyzed separately in a case control study.
I included mock datasets samples for your convenience (MALIB.DKA_HSS_SORTA6 and malib.DKA_HHS_first) and I indicated this clearly in the attached SAS code. Thanks!
Columns in the input dataset which only has dka and/or hhs cases based on diagnosisIDs:
Output dataset:
Same as above and these two addiitonal columns:
No errors in code when I ran it each step or sql.
In the final datasets, there are duplicateIDs, but different encountersIDs for each patient.
Thanks so much for all your help!
Nisrine
/*Acquire diagnosisIDS */
proc sql;
create table malib.dkahhscases as
select encounterID,
diagnosisID
from work.f_diagnosis
where diagnosisID in (/*DKA*/ 5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201,
147333, 140368, 16755 , 16756,
/*HHS event*/ 102941, 556, 5507, 5508 , 140966, 16758, 16757);
quit;
/*Create binary variables 0 and 1 for outcomes of interest*/
proc sql;
create table malib.DKA_HHS as
select *,
/*DKA event*/
(case when diagnosisID in (5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201,
147333, 140368, 16755 , 16756 ) then 1 else 0 end) as DKA_event,
/*HHS event*/
(case when diagnosisID in (102941, 556, 5507, 5508 , 140966, 16758, 16757 ) then 1 else 0 end) as HHS_event
from malib.dkahhscases;
quit;
/*check if duplicate encounters but don’t use- there are duplicate encounters but don’t use the deduped dataset*/
proc sort data = malib.dka_hhs out =duplicate1 nodupkey;
by encounterID;
run;
/*sort by encounterID*/
proc sort data = malib.dka_hhs out = malib.dka_hhs_S;
by encounterID;
run;
/*Collapse Rows to Get Unique Encounters - Keep Max Value for diagnosisID—because one encounter may have multiple diagnosisIDs for either outcome and I don’t want to loose any cases*/
proc sql;
create table malib.dka_hhs_C as
select encounterID, diagnosisID,
max(dka_event) as dka_event,
max(hhs_event) as hhs_event
from malib.dka_hhs_S
group by encounterID;
quit; /* grouped by */
/*this is a final table of only DKA and HHS events in the full database - when encounterID is not deduped*/
title 'proc freq - unique encounters with diagnosisIDs of ineterst';
proc freq data=malib.dka_hhs_C;
table DKA_event hhs_event;
run; /*DKA and HHS frequencies are higher if I don’t dedupe by encounter_ids*/
/*Dedupe by encounterID now that I collapsed*/
proc sort data = malib.dka_hhs_C out = malib.dka_hhs_CD nodupkey;
by encounterID;
run;
/*this is a table of only DKA and HHS events in full dataset*/
title 'proc freq - unique encounters with diagnosisIDs of ineterst if encounterID is deduped';
proc freq data=malib.dka_hhs_CD;
table DKA_event hhs_event;
run; /**/
/*USE malib.dka_hhs_CD*/
/*Inner Join Acquire Encounters for Adult Inpatients with
DKA or HHS*/
/*link to encounter_patient to get the dischaged_date*/
proc sql;
create table malib.dka_hss_sortA2 as
select a.*, b.dka_event, b.hhs_event, b.diagnosisID
from nhlib.enc_patient as a inner join malib.dka_hhs_CD as b
on a.encounterID =b.encounterID;
quit; /*74710 rows and 6 columns.
*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey;
by encounterID;
run; /*zero duplicate values so deleted malib.dka_hss_sortA3 - it was a quality check only*/
*In full dataset but only with a dataset with DKA and HHS;
Title 'DKA frequency in full dataset inpatients';
proc freq data = malib.dka_hss_sortA2;
table dka_event;
run; *;
proc freq data = malib.dka_hss_sortA2;
table HHS_event;
run; *;
Title 'UniqueID - malib.dka_hss_sortA2';
proc sql;
select count(distinct UniqueID) from malib.dka_hss_sortA2;
quit; *;
Title 'encounter sk - malib.dka_hss_sortA2';
proc sql;
select count(distinct encounterID) from malib.dka_hss_sortA2;
quit; /**/
/* Are there missing UniqueIDs*/
data malib.dka_hss_sortA2B;
set malib.dka_hss_sortA2;
if UniqueID = . then output;
run; /*Some missing Unique IDs to delete later */
/*Testing to see if there are duplicate sks and there are, but no duplicate encounterIDs*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey;
by UniqueID;
run; /*don't use , only for checking purposes - dataset deleted*/
/*run without deduping by UniqueID*/
proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3;
by UniqueID;
run; /*one patient with different encounters but seems to be the same diagnosisID-- I collapsed above*/
data malib.dka_hss_sortA4;
set malib.dka_hss_sortA3;
if UniqueID = . then delete;
run;
/*deleted missing UniqueIDs*/
/*Acquire first occurrence*/
/* Sort data by UniqueID and discharged_date */
proc sort data=MALIB.DKA_HSS_SORTA4;
by UniqueID dischaged_date;
run; /*74325 observations and 6 variables*/
/*Reorder Varariables*/
proc sql;
create table MALIB.DKA_HSS_SORTA5 as
select UniqueID, encounterID, dischaged_date, dka_event,
hhs_event , diagnosisID
from MALIB.DKA_HSS_SORTA4;
quit; /*
*/
/*sort again*/
proc sort data = MALIB.DKA_HSS_SORTA5;
by UniqueID dka_event dischaged_date ;
run;
/*INPUT DATASET – included a sample*/
/*all discharge is the formatted dischaged_date */
data MALIB.DKA_HSS_SORTA6;
set MALIB.DKA_HSS_SORTA5;
alldischarge = datepart(dischaged_date);
format alldischarge YYMMDD8.;
run;
/*Output DATASET – included a sample*/
/*Get first occurrence */
data malib.DKA_HHS_first;
set MALIB.DKA_HSS_SORTA6;
by UniqueID;
if first.UniqueID and dka_event=1 then DKA_date = alldischarge;
if first.UniqueID and hhs_event=1 then hhs_date = alldischarge;
run; *it worked - 74325 observations and 7 variables;
/*Is the above code for first occurrence, correct? */
/*how can I only keep the first occurrence?*/
/*Composite Outcome*/
data work.composite;
set MALIB.DKA_HSS_SORTA6;
by UniqueID;
if dka_event = 1 and hhs_event = 1 then output;
run;
/*How can I get the composite outcome based on first occurrence?*/
Since you start with dataset f_diagnosis, you should post an example for this, and for the final outcome. I have a strong feeling that the whole process can be streamlined into a few steps.
Ideally, post data as a DATA step with DATALINES in a code box, so we can recreate your dataset with a simple copy/paste and submit. If you attach a csv file, we must
so don't be surprised if it takes time before someone answers.
Thanks for the advice.
The input dataset below is based on multiple datasets that are very large. This is just a sample.
Again, I would like to only keep the first occurrence for each outcome and the combined outcome - we can call it (meaning when an encounter has both dka and hhs.
Is this what you mean? I inserted the data and the code.
/*INPUT DATASET – included a sample*/
/*all discharge is the formatted dischaged_date */
data MALIB.DKA_HSS_SORTA6;
set MALIB.DKA_HSS_SORTA5; /*Sort 5A- sorted by UniqueID dka_event and discharged_date*/
alldischarge = datepart(dischaged_date);
format alldischarge YYMMDD8.;
run;
Data MALIB.DKA_HSS_SORTA6;
Set MALIB.DKA_HSS_SORTA5;
Infile uniqueID encounterID discharged_date dka_event hhs_event diagnosis_id alldischarge
1 1 25AUG2012:00:30:00.000000 0 1 5507 2012-08-25
2 2 28MAR2014:11:59:00.000000 1 0 5503 2014-03-28
2 3 29MAY2014:17:30:00.000000 1 0 5504 2014-05-29
2 4 12JUN2014:22:30:00.000000 1 0 5504 2014-06-12
2 5 19JUN2014:11:43:00.000000 1 0 5504 2014-06-19
2 6 17JUL2014:14:39:00.000000 1 0 5504 2014-07-17
2 7 29JUL2014:18:39:00.000000 1 0 5504 2014-07-29
2 8 06AUG2014:21:00:00.000000 1 0 5504 2014-08-06
2 9 28SEP2014:14:12:00.000000 1 0 5504 2014-09-28
2 10 28OCT2014:16:27:00.000000 1 0 5504 2014-10-28
2 11 03NOV2014:12:10:00.000000 1 0 140368 2014-11-03
2 12 15NOV2014:13:00:00.000000 1 0 5504 2014-11-15
2 13 03DEC2014:15:59:00.000000 1 0 5504 2014-12-03
2 14 23JAN2015:14:15:00.000000 1 0 5504 2015-01-23
2 15 21FEB2015:15:28:00.000000 1 0 5504 2015-02-21
2 16 06MAY2015:12:00:00.000000 1 0 5504 2015-05-06
2 17 06JUN2015:00:50:00.000000 1 0 5504 2015-06-06
2 18 10SEP2015:21:30:00.000000 1 0 140368 2015-09-10
3 19 13JUN2011:14:41:00.000000 1 0 5504 2011-06-13
3 20 19SEP2011:19:30:00.000000 1 0 5504 2011-09-19
3 21 12APR2013:16:05:00.000000 1 0 5504 2013-04-12
3 22 01SEP2016:14:29:00.000000 1 0 142091 2016-09-01
4 23 17AUG2012:18:05:00.000000 0 1 5507 2012-08-17
4 24 12SEP2012:18:59:00.000000 0 1 5507 2012-09-12
4 25 12OCT2012:14:12:00.000000 0 1 5507 2012-10-12
4 26 28FEB2013:13:42:00.000000 0 1 5507 2013-02-28
4 27 27NOV2012:13:42:00.000000 1 0 5503 2012-11-27
4 28 11JAN2013:16:20:00.000000 1 0 5503 2013-01-11
4 29 13JUL2013:16:22:00.000000 1 0 5503 2013-07-13
5 30 22AUG2010:16:48:00.000000 0 1 5507 2010-08-22
5 31 04MAR2013:17:07:00.000000 0 1 5507 2013-03-04
5 32 07JUL2007:14:45:00.000000 1 0 5504 2007-07-07
5 33 14JUL2009:17:20:00.000000 1 0 5503 2009-07-14
5 34 07JUL2011:18:34:00.000000 1 0 5504 2011-07-07
6 35 14JUN2016:13:30:00.000000 1 0 140368 2016-06-14
6 36 07DEC2016:18:00:00.000000 1 0 140368 2016-12-07
7 37 14FEB2013:10:55:00.000000 1 0 5504 2013-02-14
7 38 09AUG2013:18:00:00.000000 1 0 5504 2013-08-09
8 39 08NOV2015:15:15:00.000000 1 1 140966 2015-11-08
Run;
Output dataset
/*Get first occurrence */
Data malib.DKA_HHS_first;
Set MALIB.DKA_HSS_SORTA6;
Infile uniqueID encounterID discharged_date dka_event hhs_event diagnosis_id alldischarge dka_date hhs_date
1 1 25AUG2012:00:30:00.000000 0 1 5507 2012-08-25 19230
2 2 28MAR2014:11:59:00.000000 1 0 5503 2014-03-28 19810
2 3 29MAY2014:17:30:00.000000 1 0 5504 2014-05-29
2 4 12JUN2014:22:30:00.000000 1 0 5504 2014-06-12
2 5 19JUN2014:11:43:00.000000 1 0 5504 2014-06-19
2 6 17JUL2014:14:39:00.000000 1 0 5504 2014-07-17
2 7 29JUL2014:18:39:00.000000 1 0 5504 2014-07-29
2 8 06AUG2014:21:00:00.000000 1 0 5504 2014-08-06
2 9 28SEP2014:14:12:00.000000 1 0 5504 2014-09-28
2 10 28OCT2014:16:27:00.000000 1 0 5504 2014-10-28
2 11 03NOV2014:12:10:00.000000 1 0 140368 2014-11-03
2 12 15NOV2014:13:00:00.000000 1 0 5504 2014-11-15
2 13 03DEC2014:15:59:00.000000 1 0 5504 2014-12-03
2 14 23JAN2015:14:15:00.000000 1 0 5504 2015-01-23
2 15 21FEB2015:15:28:00.000000 1 0 5504 2015-02-21
2 16 06MAY2015:12:00:00.000000 1 0 5504 2015-05-06
2 17 06JUN2015:00:50:00.000000 1 0 5504 2015-06-06
2 18 10SEP2015:21:30:00.000000 1 0 140368 2015-09-10
3 19 13JUN2011:14:41:00.000000 1 0 5504 2011-06-13 18791
3 20 19SEP2011:19:30:00.000000 1 0 5504 2011-09-19
3 21 12APR2013:16:05:00.000000 1 0 5504 2013-04-12
3 22 01SEP2016:14:29:00.000000 1 0 142091 2016-09-01
4 23 17AUG2012:18:05:00.000000 0 1 5507 2012-08-17 19222
4 24 12SEP2012:18:59:00.000000 0 1 5507 2012-09-12
4 25 12OCT2012:14:12:00.000000 0 1 5507 2012-10-12
4 26 28FEB2013:13:42:00.000000 0 1 5507 2013-02-28
4 27 27NOV2012:13:42:00.000000 1 0 5503 2012-11-27
4 28 11JAN2013:16:20:00.000000 1 0 5503 2013-01-11
4 29 13JUL2013:16:22:00.000000 1 0 5503 2013-07-13
5 30 22AUG2010:16:48:00.000000 0 1 5507 2010-08-22 18496
5 31 04MAR2013:17:07:00.000000 0 1 5507 2013-03-04
5 32 07JUL2007:14:45:00.000000 1 0 5504 2007-07-07
5 33 14JUL2009:17:20:00.000000 1 0 5503 2009-07-14
5 34 07JUL2011:18:34:00.000000 1 0 5504 2011-07-07
6 35 14JUN2016:13:30:00.000000 1 0 140368 2016-06-14 20619
6 36 07DEC2016:18:00:00.000000 1 0 140368 2016-12-07
7 37 14FEB2013:10:55:00.000000 1 0 5504 2013-02-14 19403
7 38 09AUG2013:18:00:00.000000 1 0 5504 2013-08-09
8 39 08NOV2015:15:15:00.000000 1 1 140966 2015-11-08 20400 20400
Run;
/*Composite Outcome*/
data work.composite;
set MALIB.DKA_HSS_SORTA6;
by UniqueID;
if dka_event = 1 and hhs_event = 1 then output;
run;
I asked explicitly for your initial dataset f_diagnosis.
Thank you. My apologies for not getting back to you earlier. I did not have readily access to the big datasets and I had to get it at a different location. Included is the f_diagnosis. Thank you.
Data malib.dkahhscases;
Set f_diagnosis
Infile encounter_id diagnosis_id
32 755
32 5478
32 5504
32 5525
32 5620
32 5930
32 6324
32 6401
32 7278
33 755
33 782
33 16118
33 5503
33 5617
33 5778
33 5789
33 7278
33 7907
30 755
30 782
30 1342
30 1580
30 2132
30 5482
30 5507
30 5652
30 16118
34 11205
19 626
20 1580
19 1580
20 1604
20 1655
19 1604
19 1604
19 2473
20 4184
20 16118
20 5504
20 5504
20 5504
34 5499
34 5504
19 5501
19 5504
19 5504
19 5533
20 5648
20 5653
19 5620
19 5648
23 17426
24 5916
24 5916
1 5648
1 5648
1 626
1 626
23 626
24 1940
23 14935
23 755
23 5500
23 5507
23 5507
27 16029
1 782
1 782
24 15786
1 5876
1 5876
25 16029
27 7278
27 16118
25 16118
27 16132
24 7278
24 2212
24 2212
25 7294
27 2463
25 16211
24 16118
23 5641
27 16290
1 6401
1 6401
25 16295
25 7389
23 5666
25 11205
24 2463
24 2463
27 11205
24 16295
1 1580
1 1580
23 5789
23 5854
24 11205
23 1580
1 16091
1 16091
24 16635
24 16635
24 3210
24 7583
23 6102
25 4171
25 4564
1 7278
1 7278
24 3575
24 3575
24 4138
24 7907
24 7907
25 14915
25 14915
27 14915
25 14935
27 14935
25 14965
25 14965
1 16910
1 16910
23 7278
27 5500
27 5503
27 5503
27 5504
25 5507
25 5507
24 620
25 755
25 755
24 14915
24 14935
23 16133
24 755
24 755
24 5507
24 5507
24 782
24 782
23 16298
27 5652
23 11205
25 5648
27 5665
25 5652
1 5482
1 5482
24 9668
24 9713
24 9713
25 5665
27 5854
25 5854
25 5854
24 5665
27 1580
1 5507
1 5507
1 5524
1 5524
25 5916
24 5819
26 755
26 755
38 149
37 2437
26 782
26 782
31 9976
31 9976
31 16118
26 16118
26 16118
21 16132
31 16132
28 755
26 7278
26 7278
30 755
30 755
38 580
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
38 3611
31 14439
38 16238
38 620
38 621
38 621
38 621
38 621
31 626
28 782
38 626
38 626
38 626
38 626
38 626
28 16118
38 4308
30 16118
28 16132
26 16292
26 16292
31 16366
28 7278
31 755
37 755
26 14915
31 14915
38 755
26 14935
26 14935
31 782
38 7324
30 7278
38 7350
31 16637
31 16637
31 5478
37 11148
28 16294
38 5482
38 5482
38 5482
38 5482
31 7363
31 7363
30 16295
38 7363
31 969
37 7278
28 14915
30 16330
31 1089
31 1089
30 14915
30 14915
38 7378
30 14935
28 14965
38 7412
30 16626
38 7431
31 15177
31 15177
28 7363
38 5498
38 5504
38 5504
38 5504
38 5504
38 5504
26 5507
26 5507
26 5507
31 5507
37 9719
31 7563
31 7563
28 1580
38 2424
31 7618
31 7618
38 1381
38 1381
38 1381
38 1381
30 1580
37 5482
31 5620
28 5498
31 2444
31 2444
30 5503
30 5504
30 5504
30 5504
38 5620
28 5503
28 5505
38 1580
38 1580
31 1593
38 5641
38 5641
31 11156
37 15858
38 7789
38 7789
38 7789
38 7789
38 7789
28 5620
37 5504
26 5665
26 5665
31 7907
31 7907
31 5690
38 5721
38 5721
37 7626
37 5620
37 16118
30 5648
30 5649
26 5854
26 5854
30 5652
28 5665
31 5916
31 5916
38 5916
38 5916
38 5916
38 8221
38 8221
38 8221
21 5500
30 5666
30 7907
21 5504
21 5504
21 5504
21 5504
26 11205
26 11205
21 5530
31 2133
30 5825
38 15690
38 15690
38 15690
38 15690
28 5854
38 15695
38 15695
38 15695
38 15700
38 15787
37 5721
37 14935
28 11205
21 7634
21 236
21 11145
21 7647
21 5620
21 5648
21 5648
21 5653
21 2132
21 11163
21 8095
21 11191
21 2444
21 16029
21 16029
8 16029
5 7429
8 5620
7 5620
2 2479
2 2508
2 11137
12 7429
4 10345
10 16029
2 16366
9 5620
8 16118
10 5620
11 5620
13 5620
8 14919
5 16118
8 14935
7 626
12 16118
7 14935
2 1281
12 5648
3 5641
10 5648
4 626
11 16118
12 5649
3 5648
13 626
12 5652
3 14919
10 14935
9 10495
13 5652
13 5653
11 14935
13 14935
5 14935
12 14935
2 11150
9 16118
10 5689
12 5689
7 2464
11 5689
4 5702
4 5704
13 11096
3 5704
9 626
3 2463
3 5721
6 620
8 16366
7 16368
13 7730
2 1381
2 1381
9 14935
6 626
6 5504
6 5504
5 16368
10 16368
12 16368
12 16472
2 7268
11 16368
2 7276
13 5916
13 11144
7 8024
13 3452
2 149
2 149
11 3452
12 3452
7 11156
5 15311
11 11156
11 15311
3 1381
6 5620
9 16368
2 620
2 623
2 5503
13 1580
2 15786
7 1604
11 107024
2 755
11 132782
7 11206
4 11205
11 139829
11 140368
11 140368
11 140513
11 141508
7 7278
8 7278
11 189124
11 189144
11 191236
11 193831
6 5641
2 7429
2 7431
6 14919
12 7278
13 7278
10 7278
11 7278
6 14935
2 16118
2 5648
2 16132
7 5504
7 5504
7 5504
8 5498
2 7634
2 14935
8 5504
8 5504
4 5504
10 5504
8 2132
13 5504
5 5504
5 5504
2 5689
12 5504
12 5504
3 5501
3 5504
11 5504
11 5504
5 2132
2 5721
2 2463
2 7672
9 15311
6 5744
6 16306
6 1381
6 16368
9 7278
6 8161
9 5504
9 5504
6 11205
18 140368
18 140368
39 140966
39 140966
18 140513
39 140996
39 140997
39 141001
18 140992
39 141084
39 141084
17 15292
39 142091
18 10340
18 143215
39 104576
39 104576
17 11143
15 10495
15 15311
15 11131
17 5504
16 5504
16 5504
17 1580
39 111181
17 7789
17 5620
16 5620
39 113378
17 5641
17 8111
17 5648
15 5504
15 5504
17 5652
18 5501
18 5504
18 5504
16 16132
39 131180
15 5620
17 5721
39 193330
18 193831
39 193831
17 621
18 7907
18 5620
17 626
16 16306
15 5648
15 5652
17 16366
16 16368
15 16118
17 16384
14 7278
18 16118
18 132918
15 5721
18 5689
18 5720
18 16132
16 2457
16 2463
17 14935
16 14935
15 16368
18 16368
14 620
17 3452
16 3452
17 7278
16 7278
15 14935
18 2464
18 139829
14 626
14 10495
14 11131
14 1381
14 5504
14 5504
14 5504
14 7730
14 5620
14 16118
14 16132
14 16132
14 5689
14 16133
14 16368
14 14935
35 140992
35 146722
22 140992
22 140997
36 146722
22 141002
35 104576
36 190702
35 132289
36 104576
35 191234
22 141534
35 105951
22 142091
22 142091
22 142091
22 132782
35 193831
36 132914
36 193831
22 193831
36 188106
35 107031
36 133515
22 139829
36 134001
35 140136
22 139871
35 140368
35 140368
35 140368
35 140368
22 140190
36 140368
36 140368
36 140368
run;
Do you really record "discharge date" information to the millionth of a second? Tell who ever creates that data the extra zeroes are just plain ugly and don't carry any significant information and truncate to seconds. Personally if the value carries any time component then the variable should not have a "date" name but a "datetime" so you can sort of expect to see a time component.
Don't forget to format your data variables in SAS.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.