BookmarkSubscribeRSS Feed
RaquelVO
Calcite | Level 5

Hi All,

 

I have been trying to run a PROC LCA with a sample of 4300 cases, however, I keep on running into issues where the model will not fit. It appears to be an issue of the sparseness of data, but I only have a couple of variables with missing data. Could it be because most of my variables are dummy variables? Any suggestions on how to fix this?

 

Here's the code I used:

PROC LCA DATA=raquel.homicide outest=raquel.homicidelca2a outparam=raquel.homicidelca2b outpost=raquel.homicidelca2c;
NCLASS 2;
ITEMS Age Age_Entry Schooling CR_B CR_H CR_W  CR_M CR_F item1 item2 item3 item4 item5 item6 item7 item8 item9 
item10 item11 item12 item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 item25 item26 
item27 item28 item29 item30 item31 item32 item33 item34 item35 item36 item37 item38 item39 item40 item41 item42 item43 
item44 item45 item46 item47 item48 item49 item50 ;
CATEGORIES  84 83 13 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2;
ID idnumber;
RHO PRIOR=1;
Gamma Prior=1;
SEED 648521;
seed_draws 548961;
RUN;

 

 

And this is the error that keeps popping up:

WARNING: The estimation engine was not able to fit the saturated model
          in order to adjust G-squared to account for missing data.
          This may be due to having a large number of response items.
          The G-squared, AIC and BIC fit statistics will NOT be provided in the output.
WARNING:  The estimation engine was not able to compute standard errors.
          Standard errors may not be supported for the kind of model
          and/or the kinds of parameter constraints which you are
          using, but may be available in future software releases.
          Please see the users' guide for details.

 

 

Thanks!!!

6 REPLIES 6
Cynthia_sas
SAS Super FREQ
Hi:
These procedures were not written by SAS. See this Tech Support note http://support.sas.com/kb/30/623.html for a reference to the web site for the procedures.

Cynthia
Reeza
Super User

WARNING: The estimation engine was not able to fit the saturated model
          in order to adjust G-squared to account for missing data.
          This may be due to having a large number of response items.

 

There are a couple of issues in the log. I would check each one and fix them and see what happens. First, I assume you ran the test code and that worked?

Second, saturated model usually means that you don't have enough observations for the number of variables you have in your model. 

SAS removes observations if any of the variables are missing. So how many records are you left after you exclude your missing data?

 

I'm assuming you have 51 Items since you have 50 item variables? 

 


@RaquelVO wrote:

Hi All,

 

I have been trying to run a PROC LCA with a sample of 4300 cases, however, I keep on running into issues where the model will not fit. It appears to be an issue of the sparseness of data, but I only have a couple of variables with missing data. Could it be because most of my variables are dummy variables? Any suggestions on how to fix this?

 

Here's the code I used:

PROC LCA DATA=raquel.homicide outest=raquel.homicidelca2a outparam=raquel.homicidelca2b outpost=raquel.homicidelca2c;
NCLASS 2;
ITEMS Age Age_Entry Schooling CR_B CR_H CR_W  CR_M CR_F item1 item2 item3 item4 item5 item6 item7 item8 item9 
item10 item11 item12 item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 item25 item26 
item27 item28 item29 item30 item31 item32 item33 item34 item35 item36 item37 item38 item39 item40 item41 item42 item43 
item44 item45 item46 item47 item48 item49 item50 ;
CATEGORIES  84 83 13 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2;
ID idnumber;
RHO PRIOR=1;
Gamma Prior=1;
SEED 648521;
seed_draws 548961;
RUN;

 

 

And this is the error that keeps popping up:

WARNING: The estimation engine was not able to fit the saturated model
          in order to adjust G-squared to account for missing data.
          This may be due to having a large number of response items.
          The G-squared, AIC and BIC fit statistics will NOT be provided in the output.
WARNING:  The estimation engine was not able to compute standard errors.
          Standard errors may not be supported for the kind of model
          and/or the kinds of parameter constraints which you are
          using, but may be available in future software releases.
          Please see the users' guide for details.

 

 

Thanks!!!


 

RaquelVO
Calcite | Level 5

Hi, thanks for your help!

 

Lets go by steps 

- yes the test code worked and all the permutations I tried with my sample I also tried with the test code an it always worked

- sample size is 4300 - for all the "item" variables I have 4300 recorded observations. These are all dichotomous variables and were coded as 1 and 2 to comply with the Proc Lca requirements

- I only have two variables with missing values - when has 300 missing observations and the other 900 - which still leaves me with many more observations than variables in the model. I did not exclude missing data from the model, I coded it as missing (.) and left it in, should I have removed those cases or variables from the analysis?

- The "item" variables were just called that for this post, they are in fact independent variables that measure different concepts - but as I am using data provided by a government agency I removed the original tags. So I did include all the ones I have in this model.

 

 

Again, thank you so much for your help! I am new at this and no one in my department does LCA.

DWilson
Pyrite | Level 9

I suggest removing all the items with 2 levels and seeing if proc LCA provides a solution. If it does, then start adding in blocks of the 2-level items until you get non-convergence. Include as many items as you can and still get convergence.

 

Alternatively, run a proc corr on all items and see if you have perfect collinearity or near-perfect collinearity for some items and, if so, pick some items to exclude so you don't have perfect or near-perfect collinearity among the reduced set of items. Try running your LCA on the reduced set of items and see if you get convergence. If you don't, follow the process of slowly increasing the number of items in your LCA to find the maximal set of items you can have in your LCA or slowly decrease the number of items until you get convergence (forward selection and backwards selection.)

 

 

RaquelVO
Calcite | Level 5

Thank you so much!!! Multicollinearity was definitely part of the problem!

 

Somehow the models are converging and saturating, but SAS still can't produce degrees of freedom or standard errors, whether I remove all dichotomous variables or not.... 

 

WARNING: The information matrix could not be inverted during standard error calculation.
          It was singular and/or not positive definite.
          This is sometimes due to either insufficient degrees of freedom or
          a boundary value estimate for a parameter.
          Valid standard errors could not be computed.

Again, thanks for the advice, it definitely brought me a step closer.

DWilson
Pyrite | Level 9

@RaquelVO wrote:

Thank you so much!!! Multicollinearity was definitely part of the problem!

 

Somehow the models are converging and saturating, but SAS still can't produce degrees of freedom or standard errors, whether I remove all dichotomous variables or not.... 

 

WARNING: The information matrix could not be inverted during standard error calculation.
          It was singular and/or not positive definite.
          This is sometimes due to either insufficient degrees of freedom or
          a boundary value estimate for a parameter.
          Valid standard errors could not be computed.

Again, thanks for the advice, it definitely brought me a step closer.


 

Then take out all variables and try the model with one variable to see if you get a solution. It seems you have some multicollinearity with the non-binary variables. They may need to be coarsened and/or some dropped to get model convergence.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2122 views
  • 2 likes
  • 4 in conversation