BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chris22
Calcite | Level 5

My study design is as follows:

 

N subjects received MRI examinations. In a subset of cases, one or more sequences (i.e. separate “components” of the examination) were of insufficient image quality, e.g. due to subject motion, and therefore reacquired. For all sequences with >1 acquisitions, exactly one was selected for further use solely based on visual impression (even if all acquisitions were quite “bad”, one of them still had to be selected). For all acquisitions there is also a set of quantitative (continuous) quality parameters.

Sample size: 80-400 subjects per sequence

Acquisitions: 1-4, most often 2

Parameters: 3-10 continuous variables per sequence (e.g. sharpness, noise, signal-to-noise ratio)

I want to predict the binary outcome of an acquisition being selected or not selected for use (dependent variable) based on the quantitative quality parameters (multiple independent predictor variables, within-subject comparison). I find it complex because acquisitions were not assessed and categorized individually, but always in comparison to at least one more acquisition of the same sequence within a subject. These acquisitions „mutually exclude“ one another, since only one of them could be selected for further use.

 

So far I'm thinking this should be approached as a repeated-measures problem using a mixed-effects model.

 

I would use the following to assess any parameter individually: 

proc mixed data=all;
class id;
model selected=param1;
repeated /subject=id;
where sequence="seq1"; 
run;

 

But what I'm really interested in is whether or not I can predict the selection from looking at all parameters together. Does is therefore make sense to simply write:

...
model selected=param1*param2*param3*...*paramn; ...

 

It probably doesn't, but I would be grateful if someone could guide me in the right direction.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

A key thing here is that your response variable is binary.  That means that the errors are not normally distributed.  Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates.  You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables.  Since collinearity is usually not dependent on the response variable, you could use PROC REG for this.  For now, let's assume that all 10 predictors end up in the model.  Again, @PaigeMiller has given you a lead - fit a model with each as a main effect.  Here is what I would use in PROC GLIMMIX:

 

proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id; 
run;

For PROC GENMOD, it probably would not be greatly different.  Perhaps like this (see the example in the documentation here) :

 

proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;

SteveDenham

 

(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells.  Otherwise, it falls into the "Can I fit noise with this model?" category.)

 

 

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

@chris22 wrote:

Does is therefore make sense to simply write:

...
model selected=param1*param2*param3*...*paramn; ...

 


You are asking if a 10-way interaction should be fit. I have never seen a model where a ten-way interaction was needed. Perhaps you really want a model with 10 main effects, which is

 

model selected=param1 param2 param3 ... paramn;

without the asterisks. This may be reasonable, but you do run into the problem of multi-collinearity which can make your estimates have very high standard errors and even have the wrong sign.

--
Paige Miller
SteveDenham
Jade | Level 19

A key thing here is that your response variable is binary.  That means that the errors are not normally distributed.  Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates.  You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables.  Since collinearity is usually not dependent on the response variable, you could use PROC REG for this.  For now, let's assume that all 10 predictors end up in the model.  Again, @PaigeMiller has given you a lead - fit a model with each as a main effect.  Here is what I would use in PROC GLIMMIX:

 

proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id; 
run;

For PROC GENMOD, it probably would not be greatly different.  Perhaps like this (see the example in the documentation here) :

 

proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;

SteveDenham

 

(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells.  Otherwise, it falls into the "Can I fit noise with this model?" category.)

 

 

chris22
Calcite | Level 5
Thank you both for your very detailed and most helpful comments! I will follow your advice closely (and carefully check for multicollinearity).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1105 views
  • 2 likes
  • 3 in conversation