Hi everyone,
I have a dataset similar with the one provided below with multiple measurements per patient, taken roughly every 6 months. These patients are enrolled in a kidney disease registry where:
Reason = reason for termination from the registry (categorical variable with 1=transplant, 2=dialysis, 3=death).
Termdt = termination date.
eGFR = glomerular filtration rate value (continuous).
COVA = binary time-varying covariate.
CENSDATE = termination date, if available. Otherwise, it’s the last visit date.
SURVTIME = CENSDATE – ENROLLMENTDT.
Time = VISITDT – FIRSTVISITDT.
Primary outcome is reason for termination. I’m interested in modeling the association between reason for termination and race while controlling for rate of eGFR decline (slope) and for a time-varying covariate (COVA). The hypothesis being that a lower proportion of racial minorities will have transplant rather than dialysis at time of end stage renal disease onset (i.e., time of termination from registry), adjusted for covariates.
My questions:
1. How do I get the rate of decline or slope for eGFR to then use as a covariate in my model?
I’ve read some studies where they were using a linear mixed effects model to get the individual-specific annual eGFR slopes for each participant. I tried the code below with time from enrollment until date of eGFR measurement (“time” variable in my mock dataset) as a fixed effect factor and patient (“ID”) as random effect. I obtained the slope estimates highlighted below in yellow for each participant but not sure if this is correct because in the articles I’ve been reading, the eGFR slopes seemed to be much bigger whereas mine are all < 1.. Can someone please advise if this seems like the right approach?
2. For modeling the association between reason for termination from Registry (outcome) and race (predictor), controlling for eGFR slope (covariate) and COVA (covariate), would multinomial logistic regression be a good choice? If yes, how do I specify the model to account for the fact that one of my covariates (COVA) is a time-varying binary variable?
Thank you for any insights/guidance.
data mock_dataset;
input ID$ Visit$ Enrollmentdt:mmddyy. Visitdt:mmddyy. FirstVisitdt:mmddyy. LastVisitdt:mmddyy. reason$ Termdt:mmddyy. eGFR Race$ COVA$ CENSDATE:mmddyy. SURVTIME time;
format Visitdt mmddyy10. Enrollmentdt mmddyy10. FirstVisitdt mmddyy10. LastVisitdt mmddyy10. Termdt mmddyy10. CENSDATE mmddyy10.;
datalines;
001 0 1/1/2022 1/1/2022 1/1/2022 6/15/2023 . . 25.15 0 0 6/15/2023 530 0
001 1 1/1/2022 7/5/2022 1/1/2022 6/15/2023 . . 32.33 0 0 6/15/2023 530 185
001 2 1/1/2022 1/7/2023 1/1/2022 6/15/2023 . . 28.77 0 0 6/15/2023 530 371
001 3 1/1/2022 6/15/2023 1/1/2022 6/15/2023 . . 35.01 0 1 6/15/2023 530 530
002 0 2/10/2021 2/10/2021 2/10/2021 1/12/2023 2 10/25/2023 42.94 1 0 10/25/2023 987 0
002 1 2/10/2021 8/1/2021 2/10/2021 1/12/2023 2 10/25/2023 40.32 1 0 10/25/2023 987 172
002 2 2/10/2021 3/4/2022 2/10/2021 1/12/2023 2 10/25/2023 38.11 1 0 10/25/2023 987 387
002 3 2/10/2021 7/15/2022 2/10/2021 1/12/2023 2 10/25/2023 36.39 1 0 10/25/2023 987 520
002 4 2/10/2021 1/12/2023 2/10/2021 1/12/2023 2 10/25/2023 34.52 1 0 10/25/2023 987 701
003 0 10/12/2021 10/12/2021 10/12/2021 4/15/2023 1 5/6/2024 58.19 0 0 5/6/2024 937 0
003 1 10/12/2021 5/5/2022 10/12/2021 4/15/2023 1 5/6/2024 61.92 0 1 5/6/2024 937 205
003 2 10/12/2021 11/20/2022 10/12/2021 4/15/2023 1 5/6/2024 54.02 0 1 5/6/2024 937 404
003 3 10/12/2021 4/15/2023 10/12/2021 4/15/2023 1 5/6/2024 46.15 0 1 5/6/2024 937 550
004 0 9/25/2020 9/25/2020 9/25/2020 4/18/2022 . . 20.2 1 0 4/18/2022 570 0
004 1 9/25/2020 3/5/2021 9/25/2020 4/18/2022 . . 28.5 1 0 4/18/2022 570 161
004 2 9/25/2020 10/27/2021 9/25/2020 4/18/2022 . . 26.61 1 0 4/18/2022 570 397
004 3 9/25/2020 4/18/2022 9/25/2020 4/18/2022 . . . 1 0 4/18/2022 570 570
005 0 2/9/2021 2/9/2021 2/9/2021 1/15/2022 3 6/5/2023 35.96 0 0 6/5/2023 340 0
005 1 2/9/2021 8/18/2021 2/9/2021 1/15/2022 3 6/5/2023 23.25 0 1 6/5/2023 340 190
005 2 2/9/2021 1/15/2022 2/9/2021 1/15/2022 3 6/5/2023 21.98 0 0 6/5/2023 340 340
006 0 12/23/2022 12/23/2022 12/23/2022 5/15/2023 . . 30.33 1 0 5/15/2023 143 0
006 1 12/23/2022 5/15/2023 12/23/2022 5/15/2023 . . 28.06 1 1 5/15/2023 143 143
007 0 9/25/2021 9/25/2021 9/25/2021 4/29/2022 2 2/15/2024 19.8 0 1 2/15/2024 873 0
007 1 9/25/2021 4/29/2022 9/25/2021 4/29/2022 2 2/15/2024 22.01 0 1 2/15/2024 873 216
008 0 11/16/2020 11/16/2020 11/16/2020 12/15/2021 1 6/30/2023 10.2 1 1 6/30/2023 956 0
008 1 11/16/2020 5/20/2021 11/16/2020 12/15/2021 1 6/30/2023 13.51 1 0 6/30/2023 956 185
008 2 11/16/2020 12/15/2021 11/16/2020 12/15/2021 1 6/30/2023 12.85 1 0 6/30/2023 956 394
009 0 9/17/2020 9/17/2020 9/17/2020 11/8/2022 1 1/25/2024 15.58 0 0 1/25/2024 1225 0
009 1 9/17/2020 4/15/2021 9/17/2020 11/8/2022 1 1/25/2024 20.81 0 . 1/25/2024 1225 210
009 2 9/17/2020 10/10/2021 9/17/2020 11/8/2022 1 1/25/2024 28.28 0 1 1/25/2024 1225 388
009 3 9/17/2020 5/25/2022 9/17/2020 11/8/2022 1 1/25/2024 25.4 0 . 1/25/2024 1225 615
009 4 9/17/2020 11/8/2022 9/17/2020 11/8/2022 1 1/25/2024 26.9 0 . 1/25/2024 1225 782
010 0 7/21/2020 7/21/2020 7/21/2020 8/8/2022 2 10/9/2023 49.8 1 0 10/9/2023 1175 0
010 1 7/21/2020 1/15/2021 7/21/2020 8/8/2022 2 10/9/2023 35.91 1 0 10/9/2023 1175 178
010 2 7/21/2020 8/25/2021 7/21/2020 8/8/2022 2 10/9/2023 . 1 . 10/9/2023 1175 400
010 3 7/21/2020 2/12/2022 7/21/2020 8/8/2022 2 10/9/2023 28.45 1 0 10/9/2023 1175 571
010 4 7/21/2020 8/8/2022 7/21/2020 8/8/2022 2 10/9/2023 25.36 1 1 10/9/2023 1175 748
011 0 12/14/2022 12/14/2022 12/14/2022 12/14/2022 . . 52.4 0 0 12/14/2022 0 0
012 0 4/3/2021 4/3/2021 4/3/2021 10/18/2021 2 12/28/2023 14.3 1 0 12/28/2023 999 0
012 1 4/3/2021 10/18/2021 4/3/2021 10/18/2021 2 12/28/2023 10.82 1 0 12/28/2023 999 198
013 0 6/7/2019 6/7/2019 6/7/2019 7/18/2020 1 9/5/2023 28.2 0 0 9/5/2023 1551 0
013 1 6/7/2019 12/29/2019 6/7/2019 7/18/2020 1 9/5/2023 25.16 0 0 9/5/2023 1551 205
013 2 6/7/2019 7/18/2020 6/7/2019 7/18/2020 1 9/5/2023 18.74 0 0 9/5/2023 1551 407
014 0 5/18/2022 5/18/2022 5/18/2022 5/18/2022 . . 13.1 1 0 5/18/2022 0 0
015 0 3/27/2019 3/27/2019 3/27/2019 4/20/2021 2 6/22/2023 22.01 0 0 6/22/2023 1548 0
015 1 3/27/2019 9/20/2019 3/27/2019 4/20/2021 2 6/22/2023 22.17 0 0 6/22/2023 1548 177
015 3 3/27/2019 10/12/2020 3/27/2019 4/20/2021 2 6/22/2023 25.6 0 1 6/22/2023 1548 565
015 4 3/27/2019 4/20/2021 3/27/2019 4/20/2021 2 6/22/2023 . 0 0 6/22/2023 1548 755
;
run;
proc print data=mock_dataset; run;
proc mixed data=mock_dataset;
class ID;
model eGFR=time/solution;
random int time/subject=ID type=un solution;
ods output solutionf=sf(keep=effect estimate rename=(estimate=overall));
ods output solutionr=sr(keep=effect ID estimate /*rename=(estimate=ssdev)*/);
run;
proc print data=sr; run;
With this mixed model:
proc mixed data=mock_dataset;
class ID;
model eGFR=time/solution;
random int time/subject=ID type=un solution;
TIME is a covariate and EGFR is the dependent variable. You can get the slope on TIME for each level of ID by combining the results from the two SOLUTIONs tables. SOLUTIONF, from the MODEL statement, gives you the overall slope on TIME while SOLUTIONR, from the RANDOM statement, gives you the adjustment to that overall slope for each level of ID.
If REGISTRY is a nominal or ordinal variable, then you can use PROC GLIMMIX to model that using a multinomial logistic regression. Put COVA on the MODEL statement as a predictor, and GLIMMIX will use the correct DF to test that effect. You do not need an option to specify COVA as a time-varying covariate. GLIMMIX will detect that.
@StatsMan , so you're suggesting something like this, where sscoeff will give the individual specific slope estimate?
proc mixed data=mock_dataset; class ID; model eGFR = time/solution; random int time/ type=un subject=ID solution; ods output solutionf=sf(keep=effect estimate rename=(estimate=overall)); ods output solutionr=sr(keep=effect variety estimate rename=(estimate=ssdev)); run;
proc sort data=sf; by effect; run; proc sort data=sr; by effect; run;
data final; merge sf sr; by effect; sscoeff = overall + ssdev; run;
So then when I want to include the eGFR slope as covariate in my PROC GLIMMIX, what do I use, sscoeff? the outcome variable is REASON for termination from Registry and it is nominal with three levels: transplant, dialysis or death (though one thing to note -which isn't obvious in the mock dataset- is that the vast majority of participants have missing data for REASON).
You will need to be careful with the merging. If the merging is done correctly then SSCOEF will have the slope for each subject. You might want to take out the intercept terms in the two SOLUTION tables before doing the merge, since that is not of interest to you.
Noted, thank you for the helpful suggestion! Another question I had was, I assume that these individual slopes obtained with this code are overall slopes over the entire 5 years of follow-up(like, from time=0 to time=5 years), right? If that's the case, could I make it such that I also obtain the individual slopes at each of the 6 months intervals from baseline to 5 years?
I suppose you could do that. You will need to segment the interval so that you have enough data points to do a regression with the 6 month point at the middle.
SteveDenham
What @SteveDenham said. You will need a lot of data to segment out the slopes into 10 6 month intervals.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.