Hello All,
I am trying to estimate for a given indication :
1. the effect of sex, age, treatment and dose on the interval between injections (of the treatment injected in the hospital) over time.
2. the effect of sex, age, treatment on the dose (qualitative, dosing interval) over time.
I also want to include the hospital experience in the analysis. There are 2 options for this: nested the patient in the hospital or include the number of patients injected at hospital level and a dummy indicating whether the patient change the hospital or not.
Patients don't receive the same number of injections (actually the number of injections varies from 1 to 19) and cannot switch between treatment over time.
data Have;
*infile datalines ;
input pat hospital Sex $ Age_at_initiation treat $ change_hosp dose_units dose_interval $ Inj_nr date mmddyy10.;
format date mmddyy10.;
cards;
3 9 M 80 tttB 1 1600 Off-label 1 02/19/2015
3 13 M 80 tttB 1 1200 High 2 08/31/2015
3 13 M 80 tttB 1 1200 High 3 01/10/2016
1 1 M 69 tttA 1 100 Low 1 12/02/2012
1 1 M 69 tttA 1 100 Low 2 05/05/2013
1 1 M 69 tttA 1 200 Low 3 07/07/2013
1 2 M 69 tttA 1 200 Moderate 4 11/15/2013
1 2 M 69 tttA 1 300 Moderate 5 04/24/2014
2 6 F 25 tttB 0 300 Low 1 03/27/2016
2 6 F 25 tttB 0 500 Moderate 2 10/04/2016
;;;;
For the first model, the dependent variable "interval" is the difference between consecutive date of injections.
I have tried the following code:
proc mixed data=have covtest ;
class treat sex hospital;
model interval = age sex treat dose_ interval inj_nr / CHISQ S covb ;
random int / subject=pat(hospital) type=UN G ;
run;
My questions are:
a. Is this model specification answer to my research question?
b. How to validate and interpret the results?
Thank you in advance for your time and help.
Regards
@Ely , here are some thoughts:
How to treat the intercept? Well, it is the expected value for age=0, male, tttB, offlabel and injnr=0.
This is a case where the F tests and any lsmeans you have will be substantially more informative than the solution vector. Because PROC MIXED uses a non-full rank parameterization, various categories are set to 0 in the solution. So, look at the residual plots, F tests and LSMEANS using the AT= option.
Should I run the model using PROC GLIMMIX with a gamma link and select the model with lower AIC? That would be a gamma distribution. Your code would look something like:
/* Gamma distribution */
proc glimmix data=have method=laplace ;
class treat sex hospital;
model interval = age sex treat dose_ interval inj_nr / S dist=gamma chisq;
random int / subject=pat(hospital) type=UN G ;
run;
/* Gaussian distribution */
proc glimmix data=have method=laplace ;
class treat sex hospital;
model interval = age sex treat dose_ interval inj_nr / S dist=gaussian chisq;
random int / subject=pat(hospital) type=UN G ;
run;
Selecting the distribution with the smaller AIC ought to give you the distribution that retains the most information in the data. Note that I have fit both using METHOD=LAPLACE, as the AIC wouldn't be calculated for the gamma distribution using the default pseudo-likelihood linearization.
Again, get some plots to give you an idea of how well the model fits.
SteveDenham
Before starting on this, I have a question. In the data, I see pat=1 for two hospitals. My question is whether this patient was seen at both hospitals, or is it a situation where patient 1 is unique per hospital. That will make a difference in your RANDOM statements.
Next, I suspect that interval is not a variable with Gaussian error. Intervals like this give rise to Poisson distributed counts, and are generally gamma distributed. If that is the case, you will likely need to shift to PROC GLIMMIX. Next, what about interactions between your independent variables? Are you willing to assume that males on tttA have the same effect on interval as females on tttB? If you can make that assumption, then the model statement looks OK. And last, how many unique patients are involved. To have pat as a continuous effect, you will almost certainly need to sort your data by hospital and patient.
Given all of that, you have a pretty good model. What sort of results are you obtaining?
SteveDenham
Hello @SteveDenham,
Thank you for your quick reply.
A patient can be followed-up over time between different hospitals with the same unique ID.
Regarding this statement: " Next, what about interactions between your independent variables? Are you willing to assume that males on tttA have the same effect on interval as females on tttB? If you can make that assumption, then the model statement looks OK. And last, how many unique patients are involved. To have pat as a continuous effect, you will almost certainly need to sort your data by hospital and patient."
How to interpret the intercept ?
Regarding this statement: "Next, I suspect that interval is not a variable with Gaussian error. Intervals like this give rise to Poisson distributed counts, and are generally gamma distributed. If that is the case, you will likely need to shift to PROC GLIMMIX. "
Regards
@Ely , here are some thoughts:
How to treat the intercept? Well, it is the expected value for age=0, male, tttB, offlabel and injnr=0.
This is a case where the F tests and any lsmeans you have will be substantially more informative than the solution vector. Because PROC MIXED uses a non-full rank parameterization, various categories are set to 0 in the solution. So, look at the residual plots, F tests and LSMEANS using the AT= option.
Should I run the model using PROC GLIMMIX with a gamma link and select the model with lower AIC? That would be a gamma distribution. Your code would look something like:
/* Gamma distribution */
proc glimmix data=have method=laplace ;
class treat sex hospital;
model interval = age sex treat dose_ interval inj_nr / S dist=gamma chisq;
random int / subject=pat(hospital) type=UN G ;
run;
/* Gaussian distribution */
proc glimmix data=have method=laplace ;
class treat sex hospital;
model interval = age sex treat dose_ interval inj_nr / S dist=gaussian chisq;
random int / subject=pat(hospital) type=UN G ;
run;
Selecting the distribution with the smaller AIC ought to give you the distribution that retains the most information in the data. Note that I have fit both using METHOD=LAPLACE, as the AIC wouldn't be calculated for the gamma distribution using the default pseudo-likelihood linearization.
Again, get some plots to give you an idea of how well the model fits.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.