Hi,
I want to estimate the impact of the number of years of schooling (yrschool, continuous variable, no negative) on the salary (incwage, also continuous variable with no negative), controlling for age. The dataset included several surveys from different countries and different years (surveys are identified by the variable "sample"). I thus think the best model would be a Poisson regression with random intercept.
According to SAS Guide, I should use the BGLIMM Procedure:
http://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_bglimm_examples03.htm
My code is:
proc bglimm data=test.reduced3 seed=10571042 nmc=10000;
class sample;
model incwage = yrschool age / dist=poisson;
random int / sub=sample;
run;
However I get the following error message: " ERROR: PROC BGLIMM failed to generate samples from the posterior distribution."
How can I fix this?
With the same dataset, I can run a normal Poisson model with no random intercept with proc genmod.
You may need an offset (see Example 3 in the BGLIMM documentation). I would also recommend including an outpost= option in the PROC BGLIMM statement. The error message could be caused by having too many levels of sample as compared to the number of observations. Have you tried fitting the fixed effects model in BGLIMM rather than GENMOD? Or tried a frequentist approach using PROC GLIMMIX? The latter may have better diagnostics for what is going on.
SteveDenham
Wage doesn't seem to me like a variable that would follow a Poisson distribution as it usually is not a discrete count. Since it is considered continuous, you might want to try an exponential distribution or a gamma distribution here and in GENMOD.
SteveDenham
Thanks for the advice. You are right, after checking, the variable follows a gamma distribution. However, it seems Proc GENMOD does not allow random effects. Are there any other procedures to make gamma-regression model with random effect (or any models that can handle simultaneously data from multiple surveys on multiple countries)?
You can model a gamma distributed response including random effects in PROC GLIMMIX or PROC NLMIXED. The log link function is generally used. If you model on the log scale is linear, then use GLIMMIX; if nonlinear, use NLMIXED.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.