Hi everyone, I'm conducting a multinomial logistic regression model using proc logistic in SAS with around 3.6 million observations, an outcome with 5 levels, and dozens of categorical predictors. I had no issue running both univariate and multivariate models when setting param = ref . However, once I tried param = glm , it started giving the warning message of "The information matrix is singular and thus the convergence is questionable. specifying a larger SINGULAR= value." in multivariate models. After doing some research, I found this message suggesting a multicollinearity issue in the model. I then tried to use only 2 predictors and it still gave the message while the correlation matrix showed no correlation between the two predictors. As far as I know, the only difference of param = ref and param = glm is that param = glm uses less-than-full-rank reference coding, meaning that it will create k-1 dummy variables given k levels in the categorical predictor. These two parametrization methods should generate the same log-likelihood and estimates given the same reference level. To confirm this, I also compared the result of the two models using only 2 predictors. While param = glm throwing a warning, the result is identical to param = ref (Except a bunch of zeros in the estimates of reference levels for each predictor in param = glm , is it the cause?). My question is, why did the param = glm model throwing a warning while param = ref did not. And more importantly, in this situation, should I trust the result of the param = ref even though no warning was displayed. I appreciate any advice and suggestions. Thank you in advance.
... View more