About Schtroumpfette

Schtroumpfette · ‎04-29-2023

Thanks for the input! This is very helpful. I am not getting the warning with the latest code I am using, but it is clearer now. Thanks for tagging them.

Schtroumpfette · ‎04-29-2023

Hi Paige, I ran this code and it seems to have worked but would really like to have your thoughts! ods output ParameterEstimates=estimates_0_1; proc logistic data=malib.sk_collapseB; class new_CMI_One; model Censor(event='1') = halop_only new_CMI_One / clparm=wald; where new_CMI_One=1; estimate 'CMI:0-1' halop_only 0; strata rs; run; ods output ParameterEstimates=estimates_2_4; proc logistic data=malib.sk_collapseB; class new_CMI_One; model Censor(event='1') = halop_only new_CMI_One / clparm=wald; where new_CMI_One=2; estimate 'CMI:2-4' halop_only 0; strata rs; run; ods output ParameterEstimates=estimates_5plus; proc logistic data=malib.sk_collapseB; class new_CMI_One; model Censor(event='1') = halop_only new_CMI_One / clparm=wald; where new_CMI_One=3; estimate 'CMI:5+' halop_only 0; strata rs; run;

Schtroumpfette · ‎04-29-2023

Thanks, Here is a sample: Logistic Regression Estimates by CMI Category The FREQ Procedure Frequency Percent Row Pct Col Pct Table of rs by new_CMI_One rs new_CMI_One 1 2 3 Total 1 2 1.33 33.33 2.56 2 1.33 33.33 3.39 2 1.33 33.33 15.38 6 4.00 4 2 1.33 33.33 2.56 4 2.67 66.67 6.78 0 0.00 0.00 0.00 6 4.00 5 3 2.00 50.00 3.85 2 1.33 33.33 3.39 1 0.67 16.67 7.69 6 4.00 6 3 2.00 50.00 3.85 3 2.00 50.00 5.08 0 0.00 0.00 0.00 6 4.00

Schtroumpfette · ‎04-29-2023

It varies, the value for new_one_cmi can be 1, 2 or 3 for each risk set. new_cmi_one represents comorbidities so each patient in RS (1 case and 4 controls) can have different comorbidity levels. I attached a sample in csv but would be happy to send it differently. new_cmi_one (1) is 0-1 comorbidities new_cmi_one (2) 2-4 comorbidities new_cmi_one (3) 5+ comorbidities They are not always the same for each matched case and control. They were matched on age and sex. e.g. New_cmi_one rs 1 1 1 1 2 1 2 1 3 1 3 1

Schtroumpfette · ‎04-29-2023

Thanks Page, I will have to investigate the warning, but I have 3 new_CMI_levels (and checked by proc freq), so not sure what the issue is. What would the code be if I want to run it for all three CMI levels. Is this code correct? proc logistic data=test; class new_CMI_One; model Censor(event='1') = halop_only new_CMI_One / clparm=wald; estimate 'CMI:0-1' new_CMI_One 1 -1 0 halop_only 0; estimate 'CMI:0-1 vs. 5+' new_CMI_One 1 0 -1 halop_only 0; estimate 'CMI:2-4 vs. 5+' new_CMI_One 0 1 -1 halop_only 0; strata rs; run;

Schtroumpfette · ‎04-29-2023

Thanks Paige, I am not getting an error but a warning. I want to be sure the code is correct, please. 134 135 proc logistic data=test; 136 class new_CMI_One; 137 model Censor(event='1') = halop_only new_CMI_One / clparm=wald; 138 estimate 'CMI:0-1' new_CMI_One 1 -1 0 halop_only 0; 139 strata rs; 140 run; NOTE: PROC LOGISTIC is modeling the probability that censor=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. WARNING: More coefficients than levels specified for effect new_CMI_One. Some coefficients will be ignored. NOTE: There were 150 observations read from the data set WORK.TEST. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.65 seconds cpu time 0.07 seconds

Schtroumpfette · ‎04-29-2023

Hello, I would like to run a Proc Logistic Model for the attached sample dataset, a case-control study. CMI level (1, 2 or 3) 1 is 0-1 comorbidities / 2 is 2-4 comorbidities and 3 is 5+ comorbidities that I summed up previously. All variables in the dataset are numerical (length 😎 Censor is the event (1 or 0). RS is the risk set (1 case and 4 controls each) halop_only is the exposure olanz_only is another exposure that I will test separately. Basically, I want to test CMI at different levels and present it in a table as follows: Example: Comorbidity level CMI:0–1 halop_only 1.24 (0.44–3.43) CMI:2–4 halop_only 0.40 (0.04–3.75) CMI:5 + halop_only 1.60 (0.83–3.07) Here is my code, is it correct? I am getting a warning: "WARNING: More coefficients than levels specified for effect new_CMI_One. Some coefficients will be ignored". Your help will be appreciated! Thanks, proc logistic data=test; class new_CMI_One; model Censor(event='1') = halop_only new_CMI_One / clparm=wald; estimate 'CMI:0-1' new_CMI_One 1 -1 0 halop_only 0; strata rs; run;

Schtroumpfette · ‎02-23-2023

Thank you. My apologies for not getting back to you earlier. I did not have readily access to the big datasets and I had to get it at a different location. Included is the f_diagnosis. Thank you. Data malib.dkahhscases; Set f_diagnosis Infile encounter_id diagnosis_id 32 755 32 5478 32 5504 32 5525 32 5620 32 5930 32 6324 32 6401 32 7278 33 755 33 782 33 16118 33 5503 33 5617 33 5778 33 5789 33 7278 33 7907 30 755 30 782 30 1342 30 1580 30 2132 30 5482 30 5507 30 5652 30 16118 34 11205 19 626 20 1580 19 1580 20 1604 20 1655 19 1604 19 1604 19 2473 20 4184 20 16118 20 5504 20 5504 20 5504 34 5499 34 5504 19 5501 19 5504 19 5504 19 5533 20 5648 20 5653 19 5620 19 5648 23 17426 24 5916 24 5916 1 5648 1 5648 1 626 1 626 23 626 24 1940 23 14935 23 755 23 5500 23 5507 23 5507 27 16029 1 782 1 782 24 15786 1 5876 1 5876 25 16029 27 7278 27 16118 25 16118 27 16132 24 7278 24 2212 24 2212 25 7294 27 2463 25 16211 24 16118 23 5641 27 16290 1 6401 1 6401 25 16295 25 7389 23 5666 25 11205 24 2463 24 2463 27 11205 24 16295 1 1580 1 1580 23 5789 23 5854 24 11205 23 1580 1 16091 1 16091 24 16635 24 16635 24 3210 24 7583 23 6102 25 4171 25 4564 1 7278 1 7278 24 3575 24 3575 24 4138 24 7907 24 7907 25 14915 25 14915 27 14915 25 14935 27 14935 25 14965 25 14965 1 16910 1 16910 23 7278 27 5500 27 5503 27 5503 27 5504 25 5507 25 5507 24 620 25 755 25 755 24 14915 24 14935 23 16133 24 755 24 755 24 5507 24 5507 24 782 24 782 23 16298 27 5652 23 11205 25 5648 27 5665 25 5652 1 5482 1 5482 24 9668 24 9713 24 9713 25 5665 27 5854 25 5854 25 5854 24 5665 27 1580 1 5507 1 5507 1 5524 1 5524 25 5916 24 5819 26 755 26 755 38 149 37 2437 26 782 26 782 31 9976 31 9976 31 16118 26 16118 26 16118 21 16132 31 16132 28 755 26 7278 26 7278 30 755 30 755 38 580 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 38 3611 31 14439 38 16238 38 620 38 621 38 621 38 621 38 621 31 626 28 782 38 626 38 626 38 626 38 626 38 626 28 16118 38 4308 30 16118 28 16132 26 16292 26 16292 31 16366 28 7278 31 755 37 755 26 14915 31 14915 38 755 26 14935 26 14935 31 782 38 7324 30 7278 38 7350 31 16637 31 16637 31 5478 37 11148 28 16294 38 5482 38 5482 38 5482 38 5482 31 7363 31 7363 30 16295 38 7363 31 969 37 7278 28 14915 30 16330 31 1089 31 1089 30 14915 30 14915 38 7378 30 14935 28 14965 38 7412 30 16626 38 7431 31 15177 31 15177 28 7363 38 5498 38 5504 38 5504 38 5504 38 5504 38 5504 26 5507 26 5507 26 5507 31 5507 37 9719 31 7563 31 7563 28 1580 38 2424 31 7618 31 7618 38 1381 38 1381 38 1381 38 1381 30 1580 37 5482 31 5620 28 5498 31 2444 31 2444 30 5503 30 5504 30 5504 30 5504 38 5620 28 5503 28 5505 38 1580 38 1580 31 1593 38 5641 38 5641 31 11156 37 15858 38 7789 38 7789 38 7789 38 7789 38 7789 28 5620 37 5504 26 5665 26 5665 31 7907 31 7907 31 5690 38 5721 38 5721 37 7626 37 5620 37 16118 30 5648 30 5649 26 5854 26 5854 30 5652 28 5665 31 5916 31 5916 38 5916 38 5916 38 5916 38 8221 38 8221 38 8221 21 5500 30 5666 30 7907 21 5504 21 5504 21 5504 21 5504 26 11205 26 11205 21 5530 31 2133 30 5825 38 15690 38 15690 38 15690 38 15690 28 5854 38 15695 38 15695 38 15695 38 15700 38 15787 37 5721 37 14935 28 11205 21 7634 21 236 21 11145 21 7647 21 5620 21 5648 21 5648 21 5653 21 2132 21 11163 21 8095 21 11191 21 2444 21 16029 21 16029 8 16029 5 7429 8 5620 7 5620 2 2479 2 2508 2 11137 12 7429 4 10345 10 16029 2 16366 9 5620 8 16118 10 5620 11 5620 13 5620 8 14919 5 16118 8 14935 7 626 12 16118 7 14935 2 1281 12 5648 3 5641 10 5648 4 626 11 16118 12 5649 3 5648 13 626 12 5652 3 14919 10 14935 9 10495 13 5652 13 5653 11 14935 13 14935 5 14935 12 14935 2 11150 9 16118 10 5689 12 5689 7 2464 11 5689 4 5702 4 5704 13 11096 3 5704 9 626 3 2463 3 5721 6 620 8 16366 7 16368 13 7730 2 1381 2 1381 9 14935 6 626 6 5504 6 5504 5 16368 10 16368 12 16368 12 16472 2 7268 11 16368 2 7276 13 5916 13 11144 7 8024 13 3452 2 149 2 149 11 3452 12 3452 7 11156 5 15311 11 11156 11 15311 3 1381 6 5620 9 16368 2 620 2 623 2 5503 13 1580 2 15786 7 1604 11 107024 2 755 11 132782 7 11206 4 11205 11 139829 11 140368 11 140368 11 140513 11 141508 7 7278 8 7278 11 189124 11 189144 11 191236 11 193831 6 5641 2 7429 2 7431 6 14919 12 7278 13 7278 10 7278 11 7278 6 14935 2 16118 2 5648 2 16132 7 5504 7 5504 7 5504 8 5498 2 7634 2 14935 8 5504 8 5504 4 5504 10 5504 8 2132 13 5504 5 5504 5 5504 2 5689 12 5504 12 5504 3 5501 3 5504 11 5504 11 5504 5 2132 2 5721 2 2463 2 7672 9 15311 6 5744 6 16306 6 1381 6 16368 9 7278 6 8161 9 5504 9 5504 6 11205 18 140368 18 140368 39 140966 39 140966 18 140513 39 140996 39 140997 39 141001 18 140992 39 141084 39 141084 17 15292 39 142091 18 10340 18 143215 39 104576 39 104576 17 11143 15 10495 15 15311 15 11131 17 5504 16 5504 16 5504 17 1580 39 111181 17 7789 17 5620 16 5620 39 113378 17 5641 17 8111 17 5648 15 5504 15 5504 17 5652 18 5501 18 5504 18 5504 16 16132 39 131180 15 5620 17 5721 39 193330 18 193831 39 193831 17 621 18 7907 18 5620 17 626 16 16306 15 5648 15 5652 17 16366 16 16368 15 16118 17 16384 14 7278 18 16118 18 132918 15 5721 18 5689 18 5720 18 16132 16 2457 16 2463 17 14935 16 14935 15 16368 18 16368 14 620 17 3452 16 3452 17 7278 16 7278 15 14935 18 2464 18 139829 14 626 14 10495 14 11131 14 1381 14 5504 14 5504 14 5504 14 7730 14 5620 14 16118 14 16132 14 16132 14 5689 14 16133 14 16368 14 14935 35 140992 35 146722 22 140992 22 140997 36 146722 22 141002 35 104576 36 190702 35 132289 36 104576 35 191234 22 141534 35 105951 22 142091 22 142091 22 142091 22 132782 35 193831 36 132914 36 193831 22 193831 36 188106 35 107031 36 133515 22 139829 36 134001 35 140136 22 139871 35 140368 35 140368 35 140368 35 140368 22 140190 36 140368 36 140368 36 140368 run;

Schtroumpfette · ‎02-21-2023

Thanks, this is how it is recorded but I understand.

Schtroumpfette · ‎02-21-2023

Thanks for the advice. The input dataset below is based on multiple datasets that are very large. This is just a sample. UniqueID and encounterID are text variables Everything else is numerical Again, I would like to only keep the first occurrence for each outcome and the combined outcome - we can call it (meaning when an encounter has both dka and hhs. Is this what you mean? I inserted the data and the code. /*INPUT DATASET – included a sample*/ /*all discharge is the formatted dischaged_date */ data MALIB.DKA_HSS_SORTA6; set MALIB.DKA_HSS_SORTA5; /*Sort 5A- sorted by UniqueID dka_event and discharged_date*/ alldischarge = datepart(dischaged_date); format alldischarge YYMMDD8.; run; Data MALIB.DKA_HSS_SORTA6; Set MALIB.DKA_HSS_SORTA5; Infile uniqueID encounterID discharged_date dka_event hhs_event diagnosis_id alldischarge 1 1 25AUG2012:00:30:00.000000 0 1 5507 2012-08-25 2 2 28MAR2014:11:59:00.000000 1 0 5503 2014-03-28 2 3 29MAY2014:17:30:00.000000 1 0 5504 2014-05-29 2 4 12JUN2014:22:30:00.000000 1 0 5504 2014-06-12 2 5 19JUN2014:11:43:00.000000 1 0 5504 2014-06-19 2 6 17JUL2014:14:39:00.000000 1 0 5504 2014-07-17 2 7 29JUL2014:18:39:00.000000 1 0 5504 2014-07-29 2 8 06AUG2014:21:00:00.000000 1 0 5504 2014-08-06 2 9 28SEP2014:14:12:00.000000 1 0 5504 2014-09-28 2 10 28OCT2014:16:27:00.000000 1 0 5504 2014-10-28 2 11 03NOV2014:12:10:00.000000 1 0 140368 2014-11-03 2 12 15NOV2014:13:00:00.000000 1 0 5504 2014-11-15 2 13 03DEC2014:15:59:00.000000 1 0 5504 2014-12-03 2 14 23JAN2015:14:15:00.000000 1 0 5504 2015-01-23 2 15 21FEB2015:15:28:00.000000 1 0 5504 2015-02-21 2 16 06MAY2015:12:00:00.000000 1 0 5504 2015-05-06 2 17 06JUN2015:00:50:00.000000 1 0 5504 2015-06-06 2 18 10SEP2015:21:30:00.000000 1 0 140368 2015-09-10 3 19 13JUN2011:14:41:00.000000 1 0 5504 2011-06-13 3 20 19SEP2011:19:30:00.000000 1 0 5504 2011-09-19 3 21 12APR2013:16:05:00.000000 1 0 5504 2013-04-12 3 22 01SEP2016:14:29:00.000000 1 0 142091 2016-09-01 4 23 17AUG2012:18:05:00.000000 0 1 5507 2012-08-17 4 24 12SEP2012:18:59:00.000000 0 1 5507 2012-09-12 4 25 12OCT2012:14:12:00.000000 0 1 5507 2012-10-12 4 26 28FEB2013:13:42:00.000000 0 1 5507 2013-02-28 4 27 27NOV2012:13:42:00.000000 1 0 5503 2012-11-27 4 28 11JAN2013:16:20:00.000000 1 0 5503 2013-01-11 4 29 13JUL2013:16:22:00.000000 1 0 5503 2013-07-13 5 30 22AUG2010:16:48:00.000000 0 1 5507 2010-08-22 5 31 04MAR2013:17:07:00.000000 0 1 5507 2013-03-04 5 32 07JUL2007:14:45:00.000000 1 0 5504 2007-07-07 5 33 14JUL2009:17:20:00.000000 1 0 5503 2009-07-14 5 34 07JUL2011:18:34:00.000000 1 0 5504 2011-07-07 6 35 14JUN2016:13:30:00.000000 1 0 140368 2016-06-14 6 36 07DEC2016:18:00:00.000000 1 0 140368 2016-12-07 7 37 14FEB2013:10:55:00.000000 1 0 5504 2013-02-14 7 38 09AUG2013:18:00:00.000000 1 0 5504 2013-08-09 8 39 08NOV2015:15:15:00.000000 1 1 140966 2015-11-08 Run; Output dataset /*Get first occurrence */ Data malib.DKA_HHS_first; Set MALIB.DKA_HSS_SORTA6; Infile uniqueID encounterID discharged_date dka_event hhs_event diagnosis_id alldischarge dka_date hhs_date 1 1 25AUG2012:00:30:00.000000 0 1 5507 2012-08-25 19230 2 2 28MAR2014:11:59:00.000000 1 0 5503 2014-03-28 19810 2 3 29MAY2014:17:30:00.000000 1 0 5504 2014-05-29 2 4 12JUN2014:22:30:00.000000 1 0 5504 2014-06-12 2 5 19JUN2014:11:43:00.000000 1 0 5504 2014-06-19 2 6 17JUL2014:14:39:00.000000 1 0 5504 2014-07-17 2 7 29JUL2014:18:39:00.000000 1 0 5504 2014-07-29 2 8 06AUG2014:21:00:00.000000 1 0 5504 2014-08-06 2 9 28SEP2014:14:12:00.000000 1 0 5504 2014-09-28 2 10 28OCT2014:16:27:00.000000 1 0 5504 2014-10-28 2 11 03NOV2014:12:10:00.000000 1 0 140368 2014-11-03 2 12 15NOV2014:13:00:00.000000 1 0 5504 2014-11-15 2 13 03DEC2014:15:59:00.000000 1 0 5504 2014-12-03 2 14 23JAN2015:14:15:00.000000 1 0 5504 2015-01-23 2 15 21FEB2015:15:28:00.000000 1 0 5504 2015-02-21 2 16 06MAY2015:12:00:00.000000 1 0 5504 2015-05-06 2 17 06JUN2015:00:50:00.000000 1 0 5504 2015-06-06 2 18 10SEP2015:21:30:00.000000 1 0 140368 2015-09-10 3 19 13JUN2011:14:41:00.000000 1 0 5504 2011-06-13 18791 3 20 19SEP2011:19:30:00.000000 1 0 5504 2011-09-19 3 21 12APR2013:16:05:00.000000 1 0 5504 2013-04-12 3 22 01SEP2016:14:29:00.000000 1 0 142091 2016-09-01 4 23 17AUG2012:18:05:00.000000 0 1 5507 2012-08-17 19222 4 24 12SEP2012:18:59:00.000000 0 1 5507 2012-09-12 4 25 12OCT2012:14:12:00.000000 0 1 5507 2012-10-12 4 26 28FEB2013:13:42:00.000000 0 1 5507 2013-02-28 4 27 27NOV2012:13:42:00.000000 1 0 5503 2012-11-27 4 28 11JAN2013:16:20:00.000000 1 0 5503 2013-01-11 4 29 13JUL2013:16:22:00.000000 1 0 5503 2013-07-13 5 30 22AUG2010:16:48:00.000000 0 1 5507 2010-08-22 18496 5 31 04MAR2013:17:07:00.000000 0 1 5507 2013-03-04 5 32 07JUL2007:14:45:00.000000 1 0 5504 2007-07-07 5 33 14JUL2009:17:20:00.000000 1 0 5503 2009-07-14 5 34 07JUL2011:18:34:00.000000 1 0 5504 2011-07-07 6 35 14JUN2016:13:30:00.000000 1 0 140368 2016-06-14 20619 6 36 07DEC2016:18:00:00.000000 1 0 140368 2016-12-07 7 37 14FEB2013:10:55:00.000000 1 0 5504 2013-02-14 19403 7 38 09AUG2013:18:00:00.000000 1 0 5504 2013-08-09 8 39 08NOV2015:15:15:00.000000 1 1 140966 2015-11-08 20400 20400 Run; /*Composite Outcome*/ data work.composite; set MALIB.DKA_HSS_SORTA6; by UniqueID; if dka_event = 1 and hhs_event = 1 then output; run;

Schtroumpfette · ‎02-20-2023

Hello! I would like to make sure my SAS code is correct in identifying only the first occurrence of either of two outcomes, dka or hhs (meaning the first encounter where either outcome occurs). I would like to also identify the first composite outcomes (where both outcomes occur in the same encounter). Outcomes will be analyzed separately in a case control study. I included mock datasets samples for your convenience (MALIB.DKA_HSS_SORTA6 and malib.DKA_HHS_first) and I indicated this clearly in the attached SAS code. Thanks! Columns in the input dataset which only has dka and/or hhs cases based on diagnosisIDs: UniqueID: unique identifier specific to the patient. encounterID: hospital visit associated with the patient. Each patient can have multiple encounters throughout his existence in the cohort as they were admitted/discharged multiple times. dka_event: binary 0 or 1, event occurred or not for the patient in a specific encounter. hhs_event: binary 0 or 1, event occurred or not for the patient in a specific encounter. Discharged_date: The date the patient was discharged for that encounter. DiagnosisID: diagnosis_ids for either of the two outcomes: dka or hhs. There are multiple ids for each outcome and it is possible one encounter lists multiple ids. The IDs for either outcome are different. Alldischarge: formatted dischaged_date. Output dataset: Same as above and these two addiitonal columns: Dka_date: based on alldischarged date which is the date of first occurrence HHS_date: based on alldischarged date which is the date of first occurrence No errors in code when I ran it each step or sql. In the final datasets, there are duplicateIDs, but different encountersIDs for each patient. Thanks so much for all your help! Nisrine /*Acquire diagnosisIDS */ proc sql; create table malib.dkahhscases as select encounterID, diagnosisID from work.f_diagnosis where diagnosisID in (/*DKA*/ 5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201, 147333, 140368, 16755 , 16756, /*HHS event*/ 102941, 556, 5507, 5508 , 140966, 16758, 16757); quit; /*Create binary variables 0 and 1 for outcomes of interest*/ proc sql; create table malib.DKA_HHS as select *, /*DKA event*/ (case when diagnosisID in (5504, 5503, 555, 102940, 139200, 142092, 147334, 142091, 140369, 139201, 147333, 140368, 16755 , 16756 ) then 1 else 0 end) as DKA_event, /*HHS event*/ (case when diagnosisID in (102941, 556, 5507, 5508 , 140966, 16758, 16757 ) then 1 else 0 end) as HHS_event from malib.dkahhscases; quit; /*check if duplicate encounters but don’t use- there are duplicate encounters but don’t use the deduped dataset*/ proc sort data = malib.dka_hhs out =duplicate1 nodupkey; by encounterID; run; /*sort by encounterID*/ proc sort data = malib.dka_hhs out = malib.dka_hhs_S; by encounterID; run; /*Collapse Rows to Get Unique Encounters - Keep Max Value for diagnosisID—because one encounter may have multiple diagnosisIDs for either outcome and I don’t want to loose any cases*/ proc sql; create table malib.dka_hhs_C as select encounterID, diagnosisID, max(dka_event) as dka_event, max(hhs_event) as hhs_event from malib.dka_hhs_S group by encounterID; quit; /* grouped by */ /*this is a final table of only DKA and HHS events in the full database - when encounterID is not deduped*/ title 'proc freq - unique encounters with diagnosisIDs of ineterst'; proc freq data=malib.dka_hhs_C; table DKA_event hhs_event; run; /*DKA and HHS frequencies are higher if I don’t dedupe by encounter_ids*/ /*Dedupe by encounterID now that I collapsed*/ proc sort data = malib.dka_hhs_C out = malib.dka_hhs_CD nodupkey; by encounterID; run; /*this is a table of only DKA and HHS events in full dataset*/ title 'proc freq - unique encounters with diagnosisIDs of ineterst if encounterID is deduped'; proc freq data=malib.dka_hhs_CD; table DKA_event hhs_event; run; /**/ /*USE malib.dka_hhs_CD*/ /*Inner Join Acquire Encounters for Adult Inpatients with DKA or HHS*/ /*link to encounter_patient to get the dischaged_date*/ proc sql; create table malib.dka_hss_sortA2 as select a.*, b.dka_event, b.hhs_event, b.diagnosisID from nhlib.enc_patient as a inner join malib.dka_hhs_CD as b on a.encounterID =b.encounterID; quit; /*74710 rows and 6 columns. */ proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey; by encounterID; run; /*zero duplicate values so deleted malib.dka_hss_sortA3 - it was a quality check only*/ *In full dataset but only with a dataset with DKA and HHS; Title 'DKA frequency in full dataset inpatients'; proc freq data = malib.dka_hss_sortA2; table dka_event; run; *; proc freq data = malib.dka_hss_sortA2; table HHS_event; run; *; Title 'UniqueID - malib.dka_hss_sortA2'; proc sql; select count(distinct UniqueID) from malib.dka_hss_sortA2; quit; *; Title 'encounter sk - malib.dka_hss_sortA2'; proc sql; select count(distinct encounterID) from malib.dka_hss_sortA2; quit; /**/ /* Are there missing UniqueIDs*/ data malib.dka_hss_sortA2B; set malib.dka_hss_sortA2; if UniqueID = . then output; run; /*Some missing Unique IDs to delete later */ /*Testing to see if there are duplicate sks and there are, but no duplicate encounterIDs*/ proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3 nodupkey; by UniqueID; run; /*don't use , only for checking purposes - dataset deleted*/ /*run without deduping by UniqueID*/ proc sort data = malib.dka_hss_sortA2 out = malib.dka_hss_sortA3; by UniqueID; run; /*one patient with different encounters but seems to be the same diagnosisID-- I collapsed above*/ data malib.dka_hss_sortA4; set malib.dka_hss_sortA3; if UniqueID = . then delete; run; /*deleted missing UniqueIDs*/ /*Acquire first occurrence*/ /* Sort data by UniqueID and discharged_date */ proc sort data=MALIB.DKA_HSS_SORTA4; by UniqueID dischaged_date; run; /*74325 observations and 6 variables*/ /*Reorder Varariables*/ proc sql; create table MALIB.DKA_HSS_SORTA5 as select UniqueID, encounterID, dischaged_date, dka_event, hhs_event , diagnosisID from MALIB.DKA_HSS_SORTA4; quit; /* */ /*sort again*/ proc sort data = MALIB.DKA_HSS_SORTA5; by UniqueID dka_event dischaged_date ; run; /*INPUT DATASET – included a sample*/ /*all discharge is the formatted dischaged_date */ data MALIB.DKA_HSS_SORTA6; set MALIB.DKA_HSS_SORTA5; alldischarge = datepart(dischaged_date); format alldischarge YYMMDD8.; run; /*Output DATASET – included a sample*/ /*Get first occurrence */ data malib.DKA_HHS_first; set MALIB.DKA_HSS_SORTA6; by UniqueID; if first.UniqueID and dka_event=1 then DKA_date = alldischarge; if first.UniqueID and hhs_event=1 then hhs_date = alldischarge; run; *it worked - 74325 observations and 7 variables; /*Is the above code for first occurrence, correct? */ /*how can I only keep the first occurrence?*/ /*Composite Outcome*/ data work.composite; set MALIB.DKA_HSS_SORTA6; by UniqueID; if dka_event = 1 and hhs_event = 1 then output; run; /*How can I get the composite outcome based on first occurrence?*/

Schtroumpfette · ‎01-27-2023

This code worked. Thank you all for your help. data tomerge; set Tableb; by uniqueID; if first.uniqueID; run;

Schtroumpfette · ‎01-09-2023

Thanks again Tom, I will look at all of this and get back to you.

Schtroumpfette · ‎01-09-2023

Thanks Tom for your quick response. Race and gender are in TableB only and not in TableA. My issue is one more observations than exist in TableA. The code works, just with more observations. Is there a code to select only one GENDER and RACE from TABLEB before trying to merge it onto TABLEA? I will also try the data step code below and update you (the merge one), thanks. Nisrine

Schtroumpfette · ‎01-09-2023

Hello, Here is my issue. I am linking TableA to TableB using inner join to acquire additional two variables from TableB. TableA: Master cohort table has uniqueID (unique identifier, each identifier representing 1 patient), encounterID (each uniqueID can have multiple encounters, which means the same patient had multiple hospital admissions), age, and 55 other variables such as DrugA (binary), diagnosisID, date of cohort entry, etc. It was prepared linking different other tables together. TableB has uniqueID (unique identifier), gender and race, two additional variables I would like to add to TableA. Common identifier for TableA and TableB is uniqueID-nothing else in common. Two issues: 1. Due to database errors, in rare instances, some patients may have different race or gender overtime. This is not the norm though. 2. The inner join is generating more rows or observations that what is now in TableA. Based on my research, an inner join can generate more observations since it repeats matching row in Table B for each row in Table A. (a*b for each row). The inner join works but I cannot open TableC because it is too big, but the uniqueID and EncounterID distinct count is the same as TableA. Total in Table C would be 57 variables. Note these are huge tables: -Table A : 1.7 million uniqueID and 3.4 million encounters (distinct counts) -TableB: Over 60 million unique patients, but only interested in the common 1.7 million in TableA. TableA has duplicate IDs and encounters and at this point, the decision is not to delete these duplicates for different reasons, and to ensure we don’t loose any diagnosed cases. I found I can use the Select Function to just limit to TableA and use left join. So here is my code but it is generating errors and I tried several changes based on what I found online and it is still the same issue. proc sql; create TableC as select a.*, b.gender, b.race from TableA as a left join (select distinct uniqueID from TableB) TableB as b on a.uniqueID = b.uniqueID; quit; Your help will be much appreciated! Thanks, Shtroumpfette

Online Status	Offline
Date Last Visited	‎05-08-2023 01:14 AM

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Proc Logistic Case Control

Re: Keeping first occurrence of an outcome

Re: Keeping first occurrence of an outcome

Re: Keeping first occurrence of an outcome

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Inner Join creating more rows than Master Table

Re: Inner Join creating more rows than Master Table

Re: Calculating Person-Years by Age

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Re: Proc Logistic Case Control

Proc Logistic Case Control

Re: Keeping first occurrence of an outcome

Re: Keeping first occurrence of an outcome

Re: Keeping first occurrence of an outcome

Keeping first occurrence of an outcome

Re: Inner Join creating more rows than Master Table

Re: Inner Join creating more rows than Master Table

Re: Inner Join creating more rows than Master Table

Inner Join creating more rows than Master Table