Hi, thank you all taking time to review my question. I use proc glimmix to make differential item functioning (DIF) analysis. Basically, I put all the items, interaction between item and individual-level group (e.g., sex), interaction between item and school-level group (e.g., tech), and interaction among item, sex, and tech (cross-level) into proc glimmix. If the p value of the interaction between specific item and specific group variable is significant, the item is DIF. For example, q1*sex is significant, then item 1 is DIF item.
In my estimation, I found one item was significant with cross-level interaction (e.g., q30*sex*tech). So the question is, how do I make the code to obtain the difficulty (parameter estimate) of item 30 for each group? For example, for item 30, when sex=0, tech=0, parameter estimate is A; when sex=1, tech=0, parameter estimate is B; when sex=1, tech=0, parameter estimate is C; when sex=1, tech=1, parameter estimate is D. So I want is the estimates A,B,C,D. I am not sure if I express myself explicitly? I put the code I use below, thank you again!
proc glimmix data=final_clean method=laplace;
class idschool idstud itsex tech;
model response (Event='1')= q1-q32 itsex*q1 itsex*q2 itsex*q3 itsex*q4 itsex*q5 itsex*q6 itsex*q7 itsex*q8 itsex*q9
itsex*q10 itsex*q11 itsex*q12 itsex*q13 itsex*q14 itsex*q15 itsex*q16 itsex*q17 itsex*q18 itsex*q19 itsex*q20 itsex*q21
itsex*q22 itsex*q23 itsex*q24 itsex*q25 itsex*q26 itsex*q27 itsex*q28 itsex*q29 itsex*q30 itsex*q31 itsex*q32 tech*q1 tech*q2 tech*q3 tech*q4 tech*q5 tech*q6 tech*q7 tech*q8 tech*q9
tech*q10 tech*q11 tech*q12 tech*q13 tech*q14 tech*q15 tech*q16 tech*q17 tech*q18 tech*q19 tech*q20 tech*q21
tech*q22 tech*q23 tech*q24 tech*q25 tech*q26 tech*q27 tech*q28 tech*q29 tech*q30 tech*q31 tech*q32 itsex*tech*q1 itsex*tech*q2 itsex*tech*q3 itsex*tech*q4 itsex*tech*q5 itsex*tech*q6 itsex*tech*q7
itsex*tech*q8 itsex*tech*q9 itsex*tech*q10 itsex*tech*q11 itsex*tech*q12 itsex*tech*q13 itsex*tech*q14 itsex*tech*q15 itsex*tech*q16 itsex*tech*q17 itsex*tech*q18 itsex*tech*q19 itsex*tech*q20 itsex*tech*q21
itsex*tech*q22 itsex*tech*q23 itsex*tech*q24 itsex*tech*q25 itsex*tech*q26 itsex*tech*q27 itsex*tech*q28 itsex*tech*q29 itsex*tech*q30 itsex*tech*q31 itsex*tech*q32
/ Dist=Binary link=logit solution noint ;
random intercept / subject=idschool type=vc;
random intercept / subject=idstud(idschool) type=vc;
run;
For the simple case you present, you can use an LSMEANS statement:
lsmeans itsex*tech*q30/ilink;
The ILINK option puts the means back on the original scale. One issue i see is that the model you are using treats all of the q variables as continuous. I suspect that you have dummy variables in your dataset to accomplish this. Wouldn't it be easier to have a categorical variable called "question" with 32 levels, where each record has a single entry for question ranging from 1 to 32, rather than 32 variables coded 0/1?
SteveDenham
Hi, SteveDenham,
Thank you for your help, I really appreciate. I tried your code, as your saying, it didn't work because only CLASS variable allowed in this effect. So basically, those q1-q32 are item indicators (most of them are 0, only the one matches with column item indicates 1), I have to use this long format to build this multilevel structure. I am not sure if I understand your categorical "question" variable, do you think should I put q1-q32 in the class statements?
Putting q1 through q32 in the class statement is one way of doing this, but a better way would be to restructure your data so that the variable 'question' took on the values 1, 2, 3, ... , 31, 32. Currently, I envision your data to look something like this for each line:
idschool idstud itsex tech event q1 q2 q3 ... q32
In this case, idschool has multiple levels as does idstud. Itsex has 2 levels (presumably), event has 2 levels, tech has 2 levels (for the sake of illustration) and q1 through q32 each have 2 levels, such that for school=6 and student=100 who is male and had events for question 1, 30 and 32, the data would look like:
6 100 M 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
or something along those lines. What I would suggest is a "long" version for variables idschool, idstud itsex tech question event:
6 100 M 1 1 1
6 100 M 1 2 0
6 100 M 1 3 0
<keeps repeating to >
6 100 M 1 29 0
6 100 M 1 30 1
6 100 M 1 31 1
6 100 M 1 32 0
With your data structured in this format, your GLIMMIX code would simplify to:
proc glimmix data=final_clean method=laplace;
class idschool idstud itsex tech question;
model response (Event='1')=itsex|tech|question
/ Dist=Binary link=logit solution noint ;
random intercept / subject=idschool type=vc;
random intercept / subject=idstud(idschool) type=vc;
lsmeans itsex*tech*question/ilink;
run;
The lsmeans statement will yield 128 probabilities. Down near the bottom of each sex by tech section, there will be an estimate for Question 30.
Now if you want to compare these values, see the many posts by @StatDave regarding the use of the %NLmeans macro to get the differences. The documentation for the %NLmeans macro is in this note: https://support.sas.com/kb/62/362.html
SteveDenham
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.