- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My study design is as follows:
N subjects received MRI examinations. In a subset of cases, one or more sequences (i.e. separate “components” of the examination) were of insufficient image quality, e.g. due to subject motion, and therefore reacquired. For all sequences with >1 acquisitions, exactly one was selected for further use solely based on visual impression (even if all acquisitions were quite “bad”, one of them still had to be selected). For all acquisitions there is also a set of quantitative (continuous) quality parameters.
Sample size: 80-400 subjects per sequence
Acquisitions: 1-4, most often 2
Parameters: 3-10 continuous variables per sequence (e.g. sharpness, noise, signal-to-noise ratio)
I want to predict the binary outcome of an acquisition being selected or not selected for use (dependent variable) based on the quantitative quality parameters (multiple independent predictor variables, within-subject comparison). I find it complex because acquisitions were not assessed and categorized individually, but always in comparison to at least one more acquisition of the same sequence within a subject. These acquisitions „mutually exclude“ one another, since only one of them could be selected for further use.
So far I'm thinking this should be approached as a repeated-measures problem using a mixed-effects model.
I would use the following to assess any parameter individually:
proc mixed data=all;
class id;
model selected=param1;
repeated /subject=id;
where sequence="seq1";
run;
But what I'm really interested in is whether or not I can predict the selection from looking at all parameters together. Does is therefore make sense to simply write:
...
model selected=param1*param2*param3*...*paramn;
...
It probably doesn't, but I would be grateful if someone could guide me in the right direction.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A key thing here is that your response variable is binary. That means that the errors are not normally distributed. Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates. You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables. Since collinearity is usually not dependent on the response variable, you could use PROC REG for this. For now, let's assume that all 10 predictors end up in the model. Again, @PaigeMiller has given you a lead - fit a model with each as a main effect. Here is what I would use in PROC GLIMMIX:
proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id;
run;
For PROC GENMOD, it probably would not be greatly different. Perhaps like this (see the example in the documentation here) :
proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;
SteveDenham
(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells. Otherwise, it falls into the "Can I fit noise with this model?" category.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@chris22 wrote:
Does is therefore make sense to simply write:
...
model selected=param1*param2*param3*...*paramn; ...
You are asking if a 10-way interaction should be fit. I have never seen a model where a ten-way interaction was needed. Perhaps you really want a model with 10 main effects, which is
model selected=param1 param2 param3 ... paramn;
without the asterisks. This may be reasonable, but you do run into the problem of multi-collinearity which can make your estimates have very high standard errors and even have the wrong sign.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A key thing here is that your response variable is binary. That means that the errors are not normally distributed. Consequently, I suggest moving your analysis to PROC GLIMMIX if you want estimates that are conditional on the repeated values, or PROC GEE (or GENMOD) if you want marginal estimates. You should also check for collinearity (see response from @PaigeMiller ) in your predictor variables. Since collinearity is usually not dependent on the response variable, you could use PROC REG for this. For now, let's assume that all 10 predictors end up in the model. Again, @PaigeMiller has given you a lead - fit a model with each as a main effect. Here is what I would use in PROC GLIMMIX:
proc glimmix data=all(where= (sequence="seq1"));
class id;
model selected=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
random _residual_ /subject=id;
run;
For PROC GENMOD, it probably would not be greatly different. Perhaps like this (see the example in the documentation here) :
proc genmod data=all(where= (sequence="seq1"));
class id;
model selected(event='1')=param1 param2 param3 param4 param5 param6 param7 param8 param9 param10/dist=binary;
repeated subject=id /corr=exch corrw;
run;
SteveDenham
(PS. that 10 way interaction approach would likely only be needed if your predictors were all categorical, and you had missing cells. Otherwise, it falls into the "Can I fit noise with this model?" category.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content