I want to test if the full linear regression model with group factor significantly better than the reduced model, but I can't get the statistic as below,
/* Creating sample data */
data sample_data;
input cost agegroup $ gender $ regions $ race $ group ;
datalines;
100 20-30 Male East Asian 1
150 30-40 Female West Black 0
200 40-50 Male South White 1
250 20-30 Female East Hispanic 1
300 30-40 Male West White 0
;
run;
/* Fit reduce model without group indicator */
proc glm data=sample_data;
class agegroup gender regions race;
model cost = agegroup gender regions race / solution;
ods output ParameterEstimates=Model1_out;
run;
/* Fit the full model with group */
proc glm data=sample_data;
class agegroup gender regions race group;
model cost =agegroup gender regions race group / solution;
ods output ParameterEstimates=Model2_out;
run;
/* Calculate the log-likelihood for each model */
data lst;
set Model1_out;
loglik1 = -0.5 * (_N_ * log(2 * constant('pi')) + _SSE_);
run;
data Model2_out;
set Model2_out;
loglik2 = -0.5 * (_N_ * log(2 * constant('pi')) + _SSE_);
run;
/* Calculate the likelihood ratio test statistic */
data lst;
set Model1_out;
loglik2 = loglik2;
LR = -2 * (loglik1 - loglik2);
p_value = 1 - cdf('CHISQUARE', LR, 1);
run;
First, that is way too little data for the size of the model. There are insufficient degrees of freedom to estimate all of the parameters. Regarding getting a likelihood ratio test, you should use a procedure that uses maximum likelihood estimation. GLM uses least squares, not maximum likelihood. If you have a much simpler model, or if you have many more data points, then you can get a likelihood ratio test easily in PROC GENMOD, which uses maximum likelihood estimation, by just including the TYPE3 option. For example:
proc genmod data=sample_data;
class gender group;
model cost =gender group / type3;
run;
First, that is way too little data for the size of the model. There are insufficient degrees of freedom to estimate all of the parameters. Regarding getting a likelihood ratio test, you should use a procedure that uses maximum likelihood estimation. GLM uses least squares, not maximum likelihood. If you have a much simpler model, or if you have many more data points, then you can get a likelihood ratio test easily in PROC GENMOD, which uses maximum likelihood estimation, by just including the TYPE3 option. For example:
proc genmod data=sample_data;
class gender group;
model cost =gender group / type3;
run;
Thank you! sorry for small example data,
i need include all demographic variable in the model,
can I use below model to do two model comparison?
proc genmod data=data ;
class Agegrp gender race region group;
model cost = Agegrp gender race region group / dist=normal link=identity type3;
output out=model2_out stdresdev=stdresdev p=predicted;
run;
proc genmod data=model2_out;
class Agegrp gender race region group;
model cost = Agegrp gender race region group / dist=normal link=identity type3;
contrast 'Wald Test for Model Comparison' Agegrp gender race region 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 / e Wald;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.