Hi,
I want to build a model predicting the occupation in 4 categories (occ_reduced: high skilled, medium-skilled, low-skilled, unemployed) with individuals characteristics (edu, sex, agegr) and the demand for High-skilled jobs (norm_demH ) and for low and medium-skilled jobs (norm_demML), which are year- and country-specific variables. The dataset combines surveys from different countries and different years. Each survey is identified by the variable "strate".
I first built a multiniomal logit model with proc logistic that gives consistent parameters:
proc logistic data=lab.sample outmodel=lab.occup_model2;
class edu(ref='2') imm_var(ref='0') sex / param = ref;
model occ_reduced(ref='UNEM')=edu sex agegr norm_demH norm_demML / link = glogit;
weight weight_samp /norm;
where labour=1;
run;
However I feel I should use a multilevel model since norm_demH and norm_demML are survey-specific variables. I tried several options in proc glimmix but the model never not converge. The last one looks like this:
proc glimmix data=lab.sample INITGLM;
class occ_reduced agegr SEX strate ;
model occ_reduced(ref='UNEM') =edu sex agegr norm_demH norm_demML/ solution dist=MULTINOMIAL link=glogit;
random intercept / subject=strate GROUP=occ_reduced;
weight weight_samp;
where labour=1;
NLOPTIONS TECH=NRRIDG MAXITER=100 ;
run;
I'm not so used to this kind of multilevel model so maybe there is something I'm missing. Any tips?
What happens if you add METHOD=QUAD or METHOD=LAPLACE option in the PROC GLIMMIX statement?
If that does not help, what non-converging messages did you get previously and after adding one of these options?
Thanks,
Jill
I get this error message.
118 proc glimmix data=lab.sample METHOD=QUAD; 119 title "Random intercept"; 120 class occ_reduced agegr SEX strate ; 121 model occ_reduced(ref='UNEM') =edu sex agegr rdemH rdemM rdemL/ solution dist=MULTINOMIAL 121! link=glogit; 122 random intercept / subject=strate GROUP=occ_reduced; 123 weight weight_samp; 124 where labour=1; 125 run; NOTE: Some observations are not used in the analysis because of: zero or negative weight (n=112144), missing weight (n=10511). NOTE: PROC GLIMMIX is fitting a model for nominal (unordered) data. This type of model contrasts each response level against a reference level (occ_reduced='UNEM'). ERROR: Infeasible parameter values for evaluation of objective function with 1 quadrature point. NOTE: PROCEDURE GLIMMIX used (Total process time): real time 2:52.39 cpu time 2:50.72
TIP: Include the entire LOG the attempt. Often SAS provides information that will let someone familiar with the procedure point in a direction for resolution. Copy the entire text from the LOG of the code and all the notes, messages, warnings and details. Then on the forum open a text box using the </> icon that appears above the message window and paste the text.
The text box will preserve the formatting of the text from the log and visually set the details apart from the discussion or question and answer text. Note: sometimes we find code in the LOG not to be the same as shared with the problem description which is why we ask for the entire code from the log.
110 proc glimmix data=lab.sample; 111 title "Random intercept"; 112 class occ_reduced agegr SEX strate ; 113 model occ_reduced(ref='UNEM') =edu sex agegr rdemH rdemM rdemL/ solution dist=MULTINOMIAL 113! link=glogit; 114 random intercept / subject=strate GROUP=occ_reduced; 115 weight weight_samp; 116 where labour=1; 117 run; NOTE: Some observations are not used in the analysis because of: zero or negative weight (n=112144), missing weight (n=10511). NOTE: PROC GLIMMIX is fitting a model for nominal (unordered) data. This type of model contrasts each response level against a reference level (occ_reduced='UNEM'). NOTE: Did not converge. NOTE: PROCEDURE GLIMMIX used (Total process time): real time 25:38.21 cpu time 25:37.26
What is the sample size of your entire data? I saw plenty of observations excluded from analysis due to zero, negative or missing weights. You could alternatively provide the size of the sample eventually used by the GLIMMIX procedure.
The sample size of valid observations is 1,156,858.
Number of Observations Read | 1279513 |
---|---|
Number of Observations Used | 1156858 |
Response Profile | ||
---|---|---|
Ordered Value |
occ_reduced | Total Frequency |
1 | HIGH | 418183 |
2 | LOW | 95326 |
3 | MED | 553169 |
4 | UNEM | 90180 |
In modeling category probabilities, occ_reduced='UNEM' serves as the reference category. |
Dimensions | |
---|---|
G-side Cov. Parameters | 3 |
Columns in X | 57 |
Columns in Z per Subject | 3 |
Subjects (Blocks in V) | 189 |
Max Obs per Subject | 27313 |
Optimization Information | |
---|---|
Optimization Technique | Dual Quasi-Newton |
Parameters in Optimization | 54 |
Lower Boundaries | 3 |
Upper Boundaries | 0 |
Fixed Effects | Not Profiled |
Starting From | GLM estimates |
The initial estimates did not yield a valid objective function. |
You might add PARMS statement to provide your own starting values for the covariance parameter estimates. It might take several trial and error....
I am also curious on two issues of provision of starting values by the user.
(1) I remember that maximum likelihood is the method used here. Theoretically, if identifiability holds, then the maximum likelihood estimator should be unique. So is it useful to try different starting values?
(2) I had personally tried providing starting values in the NLMIXED procedure and had noted the effect of the provision of starting values. This could have a bearing on the ultimate parameter estimates (contradicting the theoretical result I stated in the last paragraph, which is also something about which I am puzzled). So would not it be subjective if we arbitrarily provide starting values?
I tried to add parms but still no convergence.
135 proc glimmix data=lab.sample;
136 class occ_reduced agegr SEX strate ;
137 model occ_reduced(ref='UNEM') =edu sex agegr rdemH rdemM rdemL/ solution dist=MULTINOMIAL
137! link=glogit;
138 random intercept / subject=strate GROUP=occ_reduced;
139 weight weight_samp;
140 where labour=1;
141 parms;
142 run;
NOTE: Some observations are not used in the analysis because of: zero or negative weight (n=112144),
missing weight (n=10511).
NOTE: PROC GLIMMIX is fitting a model for nominal (unordered) data. This type of model contrasts each
response level against a reference level (occ_reduced='UNEM').
NOTE: Did not converge.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 25:50.28
cpu time 25:47.21
How choosing the starting value? Should I just pick a random number?
As @jiltao has mentioned, it (or maybe them) is (are) provided with trial and error. I comprehend it as providing it (them) arbitrarily.
If the current step of providing starting values is among one of the many steps in your entire analytic process and that the starting values to be provided can be estimated from the preceding step, then you can set values of the estimated parameter obtained in the preceding step as the starting values of the current step. This is a practice adopted in Amazon.com: SAS for Mixed Models, Second Edition: 9781590475003: Littell Ph.D., Ramon C., Milliken P....
sample syntax:
parms (2) (3) (0.5) (1);
But the values would depend on your data so you need to make appropriate changes.
One approach is to fit a simpler model and hopefully it will converge. Then you can use the estimated values as the starting values for your model. For example,
proc glimmix data=lab.sample method=laplace;
class occ_reduced ;
model occ_reduced(ref='UNEM') =/ solution dist=MULTINOMIAL
link=glogit;
random intercept / subject=strate GROUP=occ_reduced;
where labour=1;
run;
Thanks,
Jill
Thanks for the advice. I tried a model with no covariate, but it still does not converge.
183 proc glimmix data=lab.sample;
184 class occ_reduced agegr SEX strate ;
185 model occ_reduced(ref='UNEM') =/*edu sex agegr rdemH rdemM rdemL*// solution dist=MULTINOMIAL
185! link=glogit;
186 random intercept / subject=strate GROUP=occ_reduced;
187 weight weight_samp;
188 where labour=1;
189 parms /*(2) (3) (0.5) (1)*/;
190 run;
NOTE: Some observations are not used in the analysis because of: zero or negative weight (n=112144),
missing weight (n=10511).
NOTE: PROC GLIMMIX is fitting a model for nominal (unordered) data. This type of model contrasts each
response level against a reference level (occ_reduced='UNEM').
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 1:54.29
cpu time 1:51.84
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.