BookmarkSubscribeRSS Feed
plf515
Lapis Lazuli | Level 10

Hello

 

In the typical repeated measures problem, the DV changes over time but the IVs stay the same. In my problem, I have an IV that changes.  I cannot be completely explicit because the problem is sensitive but here is an explanation.

 

I have 5 subjects.  Each was repeatedly (over a great many trials) told to do something or not do it. The IV is whether they were told to do it or not (dichotomous).  The DV is whetehr they did it or not (also dichotomous).

 

We are not interested in intersubject differences or in the effects of time; the simple method was just to ignore these and do a table anc chisquare.  But different people might be better or worse at following the instructions, so I thought a multilevel model would be good.  But once I started trying to code, I realized I did not know how to do this.

 

Using SAS 9.4.

 

Any ideas?

 

Thanks


peter

7 REPLIES 7
SteveDenham
Jade | Level 19

Peter, this design made my brain hurt.  Let me see if the following fits: Concordance of the IV and DV in 5 subjects.  One thing that I keep coming back to is the small sample size of subjects.  That is going to make any GLMM harder to fit.  What about a binomial derived response such as the absolute value of the difference between your current IV and DV as a response (essentially the subject complies with the situation or does not).  Now I don't know how many cases you will see of not being told/subject does, but perhaps some kind of external cue exists and the subjects habituate.  Probably plenty of being told/subject does not, and plenty of agreeing situations.  If you try to fit this as a binomial (or multinomial, but I'm trying to avoid that for now) response over time, with subject as a random effect, something like:

 

proc glimmix data=have method=laplace;
class subjectID ;
effect spl = spline(time);
model agree =  /dist=binary;
random intercept spl / subject=subjectID solution;
lsmeans 

This is completely untested, and may need a lot of fixing up to work.  I hope this would give empirical Bayes estimates at the knots of the spline, but I'm not sure.

 

This gets at the temporal changes, but I'm not sure if it addresses your concerns otherwise, the more I think about this.

 

Steve Denham

DLBarker
Fluorite | Level 6

Are you familiar with the game theory research on repeated prisoner's dilemma problems?  Look into TF2T vs. TFT research on the repeated game and the implied reputation correction functions.  The reason this is important is that it speaks to the true nature of your problem, and why a multilevel model may not work.  @SteveDenham is probably not going to like my response to this, but you should not attempt to use multilevel modeling when there are likely to be lag Y and X dependencies (unless you can rule them out), the mathematics and interpretations can get very bogged down, because the lag information will capture random effect information moreso when both sides are dichotomous.

Behavioral models such as this are virtually impossible to estimate accurately due to the number of potential dependencies (is Y1,2 dependent on Y1,1?, is Y1,2 dependent on X1,1?  Is Y1,2 dependent on X1,1*X1,2 or Y1,1,*X1,2?).  Additionally, you may have a misspecification problem.  Is doing/complying different than doing/not-complying and not-doing/complying different than not-doing/not-complying?  This may not actually be a truly dichotomous outcome variable, but you can control for this by using additional predictors.

 

You have a multiclass problem that you need to control for.  You are correct in searching for a multilevel model (but explicit use of one may ignore the true nature of your problem), even if you have each subject over many trials.  You simply need to observe all of the possible dependencies.  You have a frequency of instruction issue, a frequency of response (repeated event) issue, a consecutive event issue etc..  Consider that I have a subject that I request to do something 2 times, and not to do it  the next 3.  This subject never does the thing.  However, presume on that same subject, I had requested them to do it 3 consecutive times then not to do it the next two times.  This time, on the third instruction, they do as requested.  Not only is frequency of instruction a potential concern, but so is consecutive frequency, since you have a behavioral bias problem.  Additionally, not complying last time may impact my decision to comply this time. From your one IV and observed outcomes, you now have a broad set of additional dichotomous IVs that must be ruled out first. 

 

Once you have ruled out the significance of frequency and consecutive frequency concerns and prior outcome dependence, then you can apply multilevel modeling if you wish (providing that lag Y information and interactions of lag Y are not significantly predictive).  My guess is, that once you control for all of these potential dependencies, you will already have a multilevel model (in essence) without using a multilevel approach. Using lags of y and x and their interactions as dichotomous predictors will pick up most of the random effects. This is because purely boolean LHS/RHS equations also have boolean lags and interactions.  Hence, when both the DV and the predictive variables are dichotomous, random effects can often be picked up (more accurately, though less explicitly and it may be more difficult to interpret) within the model via additional dichotomous variables.  Otherwise, simply use dichotomous regression to rule out some of these potential dependencies, then structure a multilevel model with the retained information.  It seems by my reading that you are overcomplicating and oversimplifying your problem at the same time.

 

 

SteveDenham
Jade | Level 19

@DLBarker, your comments regarding lags brings back memories of fitting systems of PDE's to physiological data.  I agree that when you have a well defined model, either deterministic or stochastic, you can avoid random effects in many cases.  But a lot of the time, there is not yet a good process to model--all that you have is subjects randomly assigned (or sort of randomly assigned) to a treatment.  In this case, I don't think there is a process that encompasses all of the elements you bring out--what we have is what the client brought in the door.  The best I could offer, without additional variables, is a way to see if the trajectory of concordance between stimulus and response looks the same for the five subjects.  Probably do as well with just a time plot, and then maybe move to a panel approach to a nonstationary process (assuming there is some kind of learning going on).

 

Also, consider that the use of random effects broadens the inference space.  If everything is considered a fixed effect, you can only infer to future realizations of identical situations--in this case, the same 5 subjects.  Not really as interesting as being able to infer to future realizations for all possible subjects.

 

Steve Denham

plf515
Lapis Lazuli | Level 10

I am glad that I am not the only one puzzled by this design! 

 

What I am doing, so far, is just a regular logistic regression controlling for subject and also separate chi-squares for each subject.   This seems to work, I just hope I am not ignoring something important

DLBarker
Fluorite | Level 6
You will likely be fine.

http://statisticalhorizons.com/wp-content/uploads/Allison.SM82.pdf

Page 90 of the above describes a repeated event framework using discrete event-times and repeatable events. It also points out some concerns and fixes in that section. Don't get too lost in the "hazard rate" definition of the approach...it is a mathematical equivalence to dichotomous event histories.

Be careful not to think of lags as only having information due to time. Lags can have importance because they pick up type information. As you said in problem design, some may be better at following directions than others. Carrying these lags will help pick up that information, even though it is not a time-effect. As I mentioned, sometimes lagged dichotomous information is sufficient to replacing a pure multilevel model.
SteveDenham
Jade | Level 19

If the queues for each subject are long enough, those PROC FREQs should also give you some pretty good estimates of Stuart's tau-c and it's confidence bounds.  I would like those intervals as a measure of agreement between request and response (I still don't know hwat the right vocabulary for these would be).

 

Steve Denham

plf515
Lapis Lazuli | Level 10

Thanks. I don't think game theory and lags are relevant here.  The time intervals are very short (1/10 of a second).  There also shouldn't be any lag effects.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1855 views
  • 8 likes
  • 3 in conversation