Hi,
I would like to perform multinomial logit regressions with effect coding (-1 0 1), but groups sizes are unequal. Because of this, intercepts correspond to the mean of means rather than the real grand mean. I wonder if there is a possibility with SAS to get adjusted intercepts in order to take into account the unbalanced data.
Thank you.
Are you fitting generalize logit regression or cumulative logit regression?
data class;
set sashelp.class end=last;
output;
if last then do;
sex='N';weight=34.5;height=123.4;output;
sex='N';weight=134.5;height=23.4;output;
sex='N';weight=74.5;height=223.4;output;
sex='N';weight=44.5;height=93.4;output;
end;
run;
proc logistic data=class;
model sex=weight height/link=glogit equalslopes;
run;
Generalize logit regressions
What is equalslopes stand for?
My model looks like this:
proc logistic;
class sex eduM3(ref='2')/param=effect ;
model edu3(ref='2')= sex eduM3 /link=glogit rsquare;
weight pond / norm;
where model=1;
run;
edu3 has 3 categories.
Some thoughts:
1) Both of the predictor variables in your MODEL statement are on a categorical scale. That is fine, if appropriate, but consequently there are no "slopes" and hence the discussion about the EQUALSLOPES option is moot. I'll add that if you check the SAS documentation (always a good idea), you'll see that the EQUALSLOPES option affects slopes associated with a continuous predictor, and has no impact on intercepts.
2) You have not provided enough information about your study. What is "pond" and how is it related to the study design? What is "model"? In general, if you want a good and appropriate answer to your question, you'll need to provide enough information.
3) Be sure that you understand how different coding systems work. See http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm The underlying model is the same, regardless of coding system. If you intend to interpret the parameter estimates, then you have to understand the coding system. If you do interpretation based on predicted values, then the coding system is moot because algebraically it all works out to the same thing.
1) Ok.
2) I am not really sure what information I should provide. The variable pond is a weight variable to make the sample fit with the population. I need to model the education (3 categories) by a serie a socioeconomic variables (in my example, I try only with 2). The problem is still present without weight and whatever are the independent variables.
3) It's probably just something I don't understand with the effect coding (-1 0 1). I want to figure out how I can replicate observed distributions with regression parameters. I have no problem doing this with parameters from dummy coding (0 1).
Now I have lots more thoughts, but before I share I'd like to know more about your study. Something like a Methods section from a manuscript or report (which you have to write eventually anyway) would be a good start.
And could you clarify what you mean by making "the sample fit the population"? In what way does the sample not represent the population (e.g., do you have stratification or clustering or unequal weighting)?
The original weight was unequal and was used to make the sample represents adequately the population by age/sex/country. But as I said, I don't think it matters, because the problem is the same with or without the weight statement.
There is no proper method yet. It's in development.
I recommend that you look into the SURVEYLOGISTIC procedure to deal with unequal weights. Search lexjansen.com for useful papers on SURVEYLOGISTIC and read the SURVEYLOGISTIC documentation to see whether this procedure would be appropriate for your study.
If you have data in hand, which appears to be the casee, then the methods by which those data were acquired, the definitions of variables, etc. are already determined, and you would be able to share those if you chose.
I don't understand. I don't have problem with the weight or the data. I just want to knnow how we can compute descriptive stastics (such as education by language) with parameters estimated with an effect coding (-1 0 1) rather than dummy (0 1).
A different coding system will not "make the sample fit the population." You really ought to look into SURVEYLOGISTIC, it might be just the right tool for your problem, and it is able to make the weighted predictions that you appear to be interested in.
It means these logit model have the same intercept term.
I don't understand what you mean. The output of the model is this:
Analysis of Maximum Likelihood Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | edu3 | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq | |
Intercept | 0 | 1 | -1.3085 | 0.0191 | 4707.0531 | <.0001 | |
Intercept | 1 | 1 | -0.0263 | 0.0104 | 6.3999 | 0.0114 | |
sex | 0 | 0 | 1 | 0.0978 | 0.0112 | 76.6518 | <.0001 |
sex | 0 | 1 | 1 | 0.1640 | 0.00906 | 327.2032 | <.0001 |
eduM3 | 0 | 0 | 1 | 1.6999 | 0.0205 | 6854.5466 | <.0001 |
eduM3 | 0 | 1 | 1 | 0.6441 | 0.0127 | 2559.4038 | <.0001 |
eduM3 | 1 | 0 | 1 | -0.4362 | 0.0256 | 290.2973 | <.0001 |
eduM3 | 1 | 1 | 1 | 0.3134 | 0.0136 | 533.6794 | <.0001 |
With those parameters, I should be able to replicate the distribution of independent variables, but I can't (and I think it is because intercepts are biaised due to the unequal size of groups).
Table of sex by edu3 | ||||
sex | edu3 | |||
0 | 1 | 2 | Total | |
0 | 22.53 | 46.45 | 31.03 | |
1 | 22.86 | 40.35 | 36.79 | |
Total | ||||
Table of eduM3 by edu3 | ||||
eduM3 | edu3 | |||
0 | 1 | 2 | Total | |
0 | 34.07 | 42.57 | 23.36 | |
1 | 6.94 | 52.9 | 40.16 | |
2 | 5.23 | 25.6 | 69.17 | |
Total |
That is odd. Two intercept should have the same estimate if you use equalslope.
Can you post the LOG ?
The previous outpus was without the equalslopes statement. With equalslopes, I now have this:
Analysis of Maximum Likelihood Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | edu3 | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq | |
Intercept | 0 | 1 | -0.7922 | 0.0106 | 5571.4516 | <.0001 | |
Intercept | 1 | 1 | -0.1503 | 0.00945 | 253.0618 | <.0001 | |
sex | 0 | 1 | 0.1445 | 0.00762 | 359.9070 | <.0001 | |
eduM3 | 0 | 1 | 0.9336 | 0.0107 | 7599.7073 | <.0001 | |
eduM3 | 1 | 1 | 0.1369 | 0.0117 | 135.8083 | <.0001 |
I'm confused, because it seems that some parameters are missing. The dependent variable has 3 categories, so how interpretes a parameter such as the one for sex (0.1445)?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.