Solved: Logistic regression with repeated measures

patrick5 · Posted 07-22-2022 05:03 PM

I have a dataset with a binary outcome (pain/no pain following joint injection) in subjects that may have received joint injections in multiple joints on the same day, in the same joint on different days, or some combination thereof. The data structure is one row per injection:

PatientID Date Pain Joint VolumeInj NumInjected BW....

203451 2/15/21 0 Hip 1.0 2 30.2

203451 2/15/21 0 Carpus 0.25 2 30.2

203451 5/2/21 0 Hip 1.0 1 28.9

312599 3/27/21 1 Carpus 0.5 1 25.5

312599 7/20/21 0 Carpus 0.3 1 26.0

....

Where Pain is the outcome, Joint is the joint injected, VolumeInj is the volume injected in the given joint, NumInjected is the number of joints injected at that visit, and BW is the body weight.

Ignoring all the reasons this is a terrible study design (please-I am well aware!), the PI is hoping to look at predictors of pain following injection; I'll use Joint as an example. I have tried the following to account for the fact that patient and date may repeat:

proc glimmix data = injections;
	FORMAT pain eventf.;
	class joint patientid date;
	model pain = joint / dist = binary link = logit ddfm = bw;
	random _residual_ / subject = patientid;
	random date;
run;

The code executes and gives me results (using SAS 9.4), but I am not at all convinced I have structured this correctly (e.g., do I need to nest patientid in date?). Any suggestions/alternate approaches would be gratefully welcomed!

StatDave · Posted 07-22-2022 05:30 PM

One thing to decide is whether you need a subject-specific model, such as a random effects model in GLIMMIX, for the purpose of predicting the outcome at the subject level, or a population-averaged model to make population level inferences, such as the effect of a predictor on the outcome. If the latter, then you could use PROC GEE to fit a Generalized Estimating Equations model. You only need to distinguish, with the SUBJECT= option in the REPEATED statement, which observations are correlated vs which are not and some general form of the correlations within subjects. Validity of the GEE method does not depend on correctly specifying the exact correlation structure, so a simple structure, like exchangeable, is often used. For example, you could do something like this to assess the overall effects of several predictors.

proc gee data = injections;
	class joint patientid;
	model pain(event="1") = joint BW ...  / dist=binomial;
	repeated subject = patientid / type=exch;
run;

View solution in original post

StatDave · Posted 07-22-2022 05:30 PM

One thing to decide is whether you need a subject-specific model, such as a random effects model in GLIMMIX, for the purpose of predicting the outcome at the subject level, or a population-averaged model to make population level inferences, such as the effect of a predictor on the outcome. If the latter, then you could use PROC GEE to fit a Generalized Estimating Equations model. You only need to distinguish, with the SUBJECT= option in the REPEATED statement, which observations are correlated vs which are not and some general form of the correlations within subjects. Validity of the GEE method does not depend on correctly specifying the exact correlation structure, so a simple structure, like exchangeable, is often used. For example, you could do something like this to assess the overall effects of several predictors.

proc gee data = injections;
	class joint patientid;
	model pain(event="1") = joint BW ...  / dist=binomial;
	repeated subject = patientid / type=exch;
run;

Logistic regression with repeated measures

Re: Logistic regression with repeated measures

Re: Logistic regression with repeated measures