BookmarkSubscribeRSS Feed
Demographer
Pyrite | Level 9

Hi,

I would like to perform multinomial logit regressions with effect coding (-1 0 1), but groups sizes are unequal. Because of this, intercepts correspond to the mean of means rather than the real grand mean. I wonder if there is a possibility with SAS to get adjusted intercepts in order to take into account the unbalanced data.

 

Thank you.

17 REPLIES 17
Ksharp
Super User

Are you fitting generalize logit regression or cumulative logit regression?


data class;
 set sashelp.class end=last;
output;
if last then do;
 sex='N';weight=34.5;height=123.4;output;
  sex='N';weight=134.5;height=23.4;output;
 sex='N';weight=74.5;height=223.4;output;
 sex='N';weight=44.5;height=93.4;output;
end;
run;


proc logistic data=class;
model sex=weight height/link=glogit equalslopes;
run;

Demographer
Pyrite | Level 9

Generalize logit regressions

 

What is equalslopes stand for?

Demographer
Pyrite | Level 9

My model looks like this:

 


proc logistic;
class sex eduM3(ref='2')/param=effect ;
model edu3(ref='2')= sex eduM3 /link=glogit rsquare;
weight pond / norm;
where model=1;
run;

 

edu3 has 3 categories.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Some thoughts:

 

1)  Both of the predictor variables in your MODEL statement are on a categorical scale. That is fine, if appropriate, but consequently there are no "slopes" and hence the discussion about the EQUALSLOPES option is moot. I'll add that if you check the SAS documentation (always a good idea), you'll see that the EQUALSLOPES option affects slopes associated with a continuous predictor, and has no impact on intercepts.

2) You have not provided enough information about your study. What is "pond" and how is it related to the study design? What is "model"? In general, if you want a good and appropriate answer to your question, you'll need to provide enough information.

3) Be sure that you understand how different coding systems work. See http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm   The underlying model is the same, regardless of coding system. If you intend to interpret the parameter estimates, then you have to understand the coding system. If you do interpretation based on predicted values, then the coding system is moot because algebraically it all works out to the same thing.

 

 

 

Demographer
Pyrite | Level 9

1) Ok.

2) I am not really sure what information I should provide. The variable pond is a weight variable to make the sample fit with the population. I need to model the education (3 categories) by a serie a socioeconomic variables (in my example, I try only with 2). The problem is still present without weight and whatever are the independent variables.

3) It's probably just something I don't understand with the effect coding (-1 0 1). I want to figure out how I can replicate observed distributions with regression parameters. I have no problem doing this with parameters from dummy coding (0 1).

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Now I have lots more thoughts, but before I share I'd like to know more about your study. Something like a Methods section from a manuscript or report (which you have to write eventually anyway) would be a good start.

 

And could you clarify what you mean by making "the sample fit the population"? In what way does the sample not represent the population (e.g., do you have stratification or clustering or unequal weighting)?

Demographer
Pyrite | Level 9

The original weight was unequal and was used to make the sample represents adequately the population by age/sex/country. But as I said, I don't think it matters, because the problem is the same with or without the weight statement.

 

There is no proper method yet. It's in development.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I recommend that you look into the SURVEYLOGISTIC procedure to deal with unequal weights. Search   lexjansen.com  for useful papers on SURVEYLOGISTIC and read the SURVEYLOGISTIC documentation to see whether this procedure would be appropriate for your study.

 

If you have data in hand, which appears to be the casee, then the methods by which those data were acquired, the definitions of variables, etc. are already determined, and you would be able to share those if you chose.

Demographer
Pyrite | Level 9

I don't understand. I don't have problem with the weight or the data. I just want to knnow how we can compute descriptive stastics (such as education by language) with parameters estimated with an effect coding (-1 0 1) rather than dummy (0 1).

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

A different coding system will not "make the sample fit the population." You really ought to look into SURVEYLOGISTIC, it might be just the right tool for your problem, and it is able to make the weighted predictions that you appear to be interested in.

Ksharp
Super User

It means these logit model have the same intercept term.

Demographer
Pyrite | Level 9

I don't understand what you mean. The output of the model is this:

 

Analysis of Maximum Likelihood Estimates
Parameter   edu3 DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   0 1 -1.3085 0.0191 4707.0531 <.0001
Intercept   1 1 -0.0263 0.0104 6.3999 0.0114
sex 0 0 1 0.0978 0.0112 76.6518 <.0001
sex 0 1 1 0.1640 0.00906 327.2032 <.0001
eduM3 0 0 1 1.6999 0.0205 6854.5466 <.0001
eduM3 0 1 1 0.6441 0.0127 2559.4038 <.0001
eduM3 1 0 1 -0.4362 0.0256 290.2973 <.0001
eduM3 1 1 1 0.3134 0.0136 533.6794 <.0001

 

With those parameters, I should be able to replicate the distribution of independent variables, but I can't (and I think it is because intercepts are biaised due to the unequal size of groups).

 

Table of sex by edu3
sex edu3
0 1 2 Total
0 22.53 46.45 31.03  
1 22.86 40.35 36.79  
Total        
         
Table of eduM3 by edu3
eduM3 edu3
0 1 2 Total
0 34.07 42.57 23.36  
1 6.94 52.9 40.16  
2 5.23 25.6 69.17  
Total        
Ksharp
Super User

That is odd. Two intercept should have the same estimate if you use equalslope.

Can you post the LOG ?

Demographer
Pyrite | Level 9

The previous outpus was without the equalslopes statement. With equalslopes, I now have this:

 

 

Analysis of Maximum Likelihood Estimates
Parameter   edu3 DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   0 1 -0.7922 0.0106 5571.4516 <.0001
Intercept   1 1 -0.1503 0.00945 253.0618 <.0001
sex 0   1 0.1445 0.00762 359.9070 <.0001
eduM3 0   1 0.9336 0.0107 7599.7073 <.0001
eduM3 1   1 0.1369 0.0117 135.8083 <.0001

 

I'm confused, because it seems that some parameters are missing. The dependent variable has 3 categories, so how interpretes a parameter such as the one for sex (0.1445)?

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 2293 views
  • 0 likes
  • 3 in conversation