Statistical Procedures

chris22 · Posted 02-16-2022 01:00 PM

My study design is as follows:

N subjects received MRI examinations. In a subset of cases, one or more sequences (i.e. separate “components” of the examination) were of insufficient image quality, e.g. due to subject motion, and therefore reacquired. For all sequences with >1 acquisitions, exactly one was selected for further use solely based on visual impression (even if all acquisitions were quite “bad”, one of them still had to be selected). For all acquisitions there is also a set of quantitative (continuous) quality parameters.
Sample size: 80-400 subjects per sequence
Acquisitions: 1-4, most often 2
Parameters: 3-10 continuous variables per sequence (e.g. sharpness, noise, signal-to-noise ratio)

I want to predict the binary outcome of an acquisition being selected or not selected for use (dependent variable) based on the quantitative quality parameters (multiple independent predictor variables, within-subject comparison). I find it complex because acquisitions were not assessed and categorized individually, but always in comparison to at least one more acquisition of the same sequence within a subject. These acquisitions „mutually exclude“ one another, since only one of them could be selected for further use.

So far I'm thinking this should be approached as a repeated-measures problem using a mixed-effects model.

I would use the following to assess any parameter individually:

proc mixed data=all;
class id;
model selected=param1;
repeated /subject=id;
where sequence="seq1"; 
run;

But what I'm really interested in is whether or not I can predict the selection from looking at all parameters together. Does is therefore make sense to simply write:

...
model selected=param1*param2*param3*...*paramn;
...

It probably doesn't, but I would be grateful if someone could guide me in the right direction.

SteveDenham · Posted 02-17-2022 08:12 AM

A key thing here is that your response variable is binary. That means that the errors are not normally distributed. Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates. You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables. Since collinearity is usually not dependent on the response variable, you could use PROC REG for this. For now, let's assume that all 10 predictors end up in the model. Again, @PaigeMiller has given you a lead - fit a model with each as a main effect. Here is what I would use in PROC GLIMMIX:

proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id; 
run;

For PROC GENMOD, it probably would not be greatly different. Perhaps like this (see the example in the documentation here) :

proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;

SteveDenham

(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells. Otherwise, it falls into the "Can I fit noise with this model?" category.)

View solution in original post

PaigeMiller · Posted 02-16-2022 03:47 PM

@chris22 wrote:

Does is therefore make sense to simply write:
...
model selected=param1*param2*param3*...*paramn;
...

You are asking if a 10-way interaction should be fit. I have never seen a model where a ten-way interaction was needed. Perhaps you really want a model with 10 main effects, which is

model selected=param1 param2 param3 ... paramn;

without the asterisks. This may be reasonable, but you do run into the problem of multi-collinearity which can make your estimates have very high standard errors and even have the wrong sign.

--
Paige Miller

SteveDenham · Posted 02-17-2022 08:12 AM

A key thing here is that your response variable is binary. That means that the errors are not normally distributed. Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates. You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables. Since collinearity is usually not dependent on the response variable, you could use PROC REG for this. For now, let's assume that all 10 predictors end up in the model. Again, @PaigeMiller has given you a lead - fit a model with each as a main effect. Here is what I would use in PROC GLIMMIX:

proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id; 
run;

For PROC GENMOD, it probably would not be greatly different. Perhaps like this (see the example in the documentation here) :

proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;

SteveDenham

(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells. Otherwise, it falls into the "Can I fit noise with this model?" category.)

chris22 · Posted 02-17-2022 02:49 PM

Thank you both for your very detailed and most helpful comments! I will follow your advice closely (and carefully check for multicollinearity).

Statistical Procedures

Correct use of a mixed-effects model for prediction of a binary outcome (within-subject design)

Re: Correct use of a mixed-effects model for prediction of a binary outcome (within-subject design)

Re: Correct use of a mixed-effects model for prediction of a binary outcome (within-subject design)

Re: Correct use of a mixed-effects model for prediction of a binary outcome (within-subject design)

Re: Correct use of a mixed-effects model for prediction of a binary outcome (within-subject design)

User-friendly SAS application: mixed model analysis, prediction and mo...

Bayesian IRT Models: Unidimensional Binary Models

How to get 95% CI for difference in Predicted Probabilities using with...

Easily Turn Your Automated Explanation Into a Predictive Model Q&A, Sl...

Team Infectious Insights - Predictive Disease Surveillance Model for M...

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...