Re: Weighted effect coding in regressions

Demographer · Posted 12-13-2016 04:43 AM

Hi,

I would like to perform multinomial logit regressions with effect coding (-1 0 1), but groups sizes are unequal. Because of this, intercepts correspond to the mean of means rather than the real grand mean. I wonder if there is a possibility with SAS to get adjusted intercepts in order to take into account the unbalanced data.

Thank you.

Ksharp · Posted 12-13-2016 05:15 AM

Are you fitting generalize logit regression or cumulative logit regression?

data class;
 set sashelp.class end=last;
output;
if last then do;
 sex='N';weight=34.5;height=123.4;output;
  sex='N';weight=134.5;height=23.4;output;
 sex='N';weight=74.5;height=223.4;output;
 sex='N';weight=44.5;height=93.4;output;
end;
run;


proc logistic data=class;
model sex=weight height/link=glogit equalslopes;
run;

Demographer · Posted 12-13-2016 05:19 AM

Generalize logit regressions

What is equalslopes stand for?

Demographer · Posted 12-13-2016 05:22 AM

My model looks like this:

proc logistic;
class sex eduM3(ref='2')/param=effect ;
model edu3(ref='2')= sex eduM3 /link=glogit rsquare;
weight pond / norm;
where model=1;
run;

edu3 has 3 categories.

sld · Posted 12-15-2016 01:03 AM

Some thoughts:

1) Both of the predictor variables in your MODEL statement are on a categorical scale. That is fine, if appropriate, but consequently there are no "slopes" and hence the discussion about the EQUALSLOPES option is moot. I'll add that if you check the SAS documentation (always a good idea), you'll see that the EQUALSLOPES option affects slopes associated with a continuous predictor, and has no impact on intercepts.

2) You have not provided enough information about your study. What is "pond" and how is it related to the study design? What is "model"? In general, if you want a good and appropriate answer to your question, you'll need to provide enough information.

3) Be sure that you understand how different coding systems work. See http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm The underlying model is the same, regardless of coding system. If you intend to interpret the parameter estimates, then you have to understand the coding system. If you do interpretation based on predicted values, then the coding system is moot because algebraically it all works out to the same thing.

Demographer · Posted 12-16-2016 03:30 AM

1) Ok.

2) I am not really sure what information I should provide. The variable pond is a weight variable to make the sample fit with the population. I need to model the education (3 categories) by a serie a socioeconomic variables (in my example, I try only with 2). The problem is still present without weight and whatever are the independent variables.

3) It's probably just something I don't understand with the effect coding (-1 0 1). I want to figure out how I can replicate observed distributions with regression parameters. I have no problem doing this with parameters from dummy coding (0 1).

sld · Posted 12-16-2016 02:33 PM

Now I have lots more thoughts, but before I share I'd like to know more about your study. Something like a Methods section from a manuscript or report (which you have to write eventually anyway) would be a good start.

And could you clarify what you mean by making "the sample fit the population"? In what way does the sample not represent the population (e.g., do you have stratification or clustering or unequal weighting)?

Demographer · Posted 12-16-2016 02:48 PM

The original weight was unequal and was used to make the sample represents adequately the population by age/sex/country. But as I said, I don't think it matters, because the problem is the same with or without the weight statement.

There is no proper method yet. It's in development.

sld · Posted 12-16-2016 03:01 PM

I recommend that you look into the SURVEYLOGISTIC procedure to deal with unequal weights. Search lexjansen.com for useful papers on SURVEYLOGISTIC and read the SURVEYLOGISTIC documentation to see whether this procedure would be appropriate for your study.

If you have data in hand, which appears to be the casee, then the methods by which those data were acquired, the definitions of variables, etc. are already determined, and you would be able to share those if you chose.

Demographer · Posted 12-16-2016 03:07 PM

I don't understand. I don't have problem with the weight or the data. I just want to knnow how we can compute descriptive stastics (such as education by language) with parameters estimated with an effect coding (-1 0 1) rather than dummy (0 1).

sld · Posted 12-16-2016 04:03 PM

A different coding system will not "make the sample fit the population." You really ought to look into SURVEYLOGISTIC, it might be just the right tool for your problem, and it is able to make the weighted predictions that you appear to be interested in.

Ksharp · Posted 12-13-2016 05:32 AM

It means these logit model have the same intercept term.

Demographer · Posted 12-13-2016 05:47 AM

I don't understand what you mean. The output of the model is this:

Analysis of Maximum Likelihood Estimates
Parameter		edu3	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		0	1	-1.3085	0.0191	4707.0531	<.0001
Intercept		1	1	-0.0263	0.0104	6.3999	0.0114
sex	0	0	1	0.0978	0.0112	76.6518	<.0001
sex	0	1	1	0.1640	0.00906	327.2032	<.0001
eduM3	0	0	1	1.6999	0.0205	6854.5466	<.0001
eduM3	0	1	1	0.6441	0.0127	2559.4038	<.0001
eduM3	1	0	1	-0.4362	0.0256	290.2973	<.0001
eduM3	1	1	1	0.3134	0.0136	533.6794	<.0001

With those parameters, I should be able to replicate the distribution of independent variables, but I can't (and I think it is because intercepts are biaised due to the unequal size of groups).

Table of sex by edu3
sex	edu3
sex	0	1	2	Total
0	22.53	46.45	31.03
1	22.86	40.35	36.79
Total

Table of eduM3 by edu3
eduM3	edu3
eduM3	0	1	2	Total
0	34.07	42.57	23.36
1	6.94	52.9	40.16
2	5.23	25.6	69.17
Total

Ksharp · Posted 12-13-2016 07:03 AM

That is odd. Two intercept should have the same estimate if you use equalslope.

Can you post the LOG ?

Demographer · Posted 12-13-2016 07:10 AM

The previous outpus was without the equalslopes statement. With equalslopes, I now have this:

Analysis of Maximum Likelihood Estimates
Parameter		edu3	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		0	1	-0.7922	0.0106	5571.4516	<.0001
Intercept		1	1	-0.1503	0.00945	253.0618	<.0001
sex	0		1	0.1445	0.00762	359.9070	<.0001
eduM3	0		1	0.9336	0.0107	7599.7073	<.0001
eduM3	1		1	0.1369	0.0117	135.8083	<.0001

I'm confused, because it seems that some parameters are missing. The dependent variable has 3 categories, so how interpretes a parameter such as the one for sex (0.1445)?

Catch up on SAS Innovate 2026