For an analytical epidemiology course I have to do a PROC MIXED test in order to see a change in cholesterol level. The cholesterol level is measured at three times, 1960, 1965 and 1970. We got some tips about how to solve this, I've copied that text below. The dataset is also added as an attachment.
I've made the following syntax:
PROC MIXED DATA=ELEARN.zutphen_broad;
CLASS RN;
MODEL TOTCHOL1 TOTCHOL2 TOTCHOL3 = RN /Solution;
RANDOM intercept/ Subject=RN;
RUN;
Which gives me the error:
NOTE: PROCEDURE MIXED used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
298 PROC MIXED DATA=ELEARN.zutphen_broad;
299 CLASS RN;
300 MODEL TOTCHOL1 TOTCHOL2 TOTCHOL3 = RN /Solution;
--------
73
202
ERROR 73-322: Expecting an =.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
301 RANDOM intercept/ Subject=RN;
302 RUN;
What am I doing wrong? I'm not skilled enough with SAS to be able to find out how to solve this. The hints are for the following questions as well, right now I'm stuck at question A (with hint A) which just want to know the regression coefficient and the P value in change from 1960 - 1970.
a) You can use PROC MIXED to conduct an analysis with a random intercept model.
PROC MIXED DATA=;
CLASS <subject identifier and categorical variables (if any)>;
MODEL <dependent variables> = <independent variables> /Solution;
RANDOM intercept/ Subject= <subject identifier>;
RUN;
In the first part of the assignment, the independent variable is time as a continuous variable.
b) In the output, you can find the slope of the regression under estimate in the table “solution of fixed effects”.
e) add TIME(ref=FIRST) to the class statement in order to model TIME as a categorical covariate. (ref=FIRST) makes TIME=0 into the reference class.
f) look at the regression results and use your common sense.
g) You can add a quadratic term of time by adding TIME*TIME as an independent variable.
If you do so, the (linear) term TIME should ALSO be in the model. Otherwise, TIME*TIME will pick up both the linear and the quadratic effect.
Testing whether this term is statistically significant, in a correct way, however, is not trivial.
The p-value given in the table “solution of fixed effects” might be to0 high because TIME and TIME*TIME are strongly correlated.
h) The correct way of testing is a likelihood ratio test, but in SAS you need to perform this test “by hand”. This means running the model with
totchol = TIME,
and also the model with
totchol=TIME TIME*TIME ,
USING MAXIMUM LIKELIHOOD, that is, using
PROC MIXED DATA=... method=ML ;
Find the log likelihoods of the two models in the output, and use these in the next piece of SAS code:
data LRT;
ll1 = 117xx.x;** fill here the loglikelihood from model TIME;
ll2 = 117xx.x; ** fill here the loglikelihood from model TIME TIME*TIME;
chi = ll1-ll2;
p = 1-probchi(chi,1);
run;
Only one dependent variable is allowed in your PROC MIXED MODEL statement.
You might want to re-arrange the input data so that the three time periods are denoted by a variable (TIMEPERIOD=1 or TIMEPERIOD=2 or TIMEPERIOD=3), and then you have a single response named CHOLESTEROL, repeated measures on the variable TIMEPERIOD.
Examples are discussed here: https://documentation.sas.com/?docsetId=statug&docsetVersion=14.3&docsetTarget=statug_mixed_examples...
data re_arrange;
set ELEARN.zutphen_broad;
cholesterol = totchol1;
timperiod=1;
output;
cholesterol=totchol2;
timeperiod=2;
output;
cholesterol=totchol3;
timeperiod=3;
output;
run;
Linear regression models (ANOVA, REG, GLM, PLS) permit multiple variables on the left-hand side of the model statement. Generalized linear models (LOGISTIC, GENMOD) and mixed models (FMM, MIXED, GLIMMIX) only support a single variable on the left-hand side.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.