Effects coding versus dummy coding in with two-level categorical varia...

cmo5 · Posted 01-03-2020 06:29 PM

I'm wondering if there are any recommendations on whether to code two-level categorical variables as continuous variables or class variables (dummy coded) in a PROC GLIMMIX model with a multinomial outcome. I would prefer to use effect coding with sample-centered IVs so that the intercepts are easier to interpret, but I've found mixed information on whether it's appropriate to interpret intercepts at all in multinomial models.

My data have two levels, drinking observations measured across 6 timepoints (3 on ascending limb of blood alcohol curve, 3 on descending limb) nested within subjects.

Driving Dist = how far participants were willing to drive, 4 level DV (0 miles, 1 mile, 3 miles, 10 miles)

cSex = sex, centered on grand mean (unequal sample sizes of men and women, so mean is near, but not exactly zero)

cMale = sex, centered on males = 0 (females = 1)

cBase_Attitudes = baseline attitudes about drinking and driving, continuous IV

Limb = limb of blood alcohol curve, centered on ascending limb = 0 (descending = 1)

Here's two versions of my syntax:

*Effects coding.;
proc glimmix data=three method=laplace empirical=mbn noitprint noclprint ;
	where PacketN>2;
	class Subj ;
	model DriveDist (order=data)= cSex cBase_AIDattitudes Limb / 
                DIST=MULTINOMIAL LINK=CLOGIT SOLUTION CL ddfm=bw 
		oddsratio(DIFF=LAST LABEL);
	random intercept / sub=Subj TYPE=VC;
	covtest / WALD;
run;

*Dummy coding.;
proc glimmix data=three method=laplace empirical=mbn noitprint noclprint ;
	where PacketN>2;
	class Subj cMale Limb;
	model DriveDist (order=data)= cMale cBase_AIDattitudes Limb / 
                DIST=MULTINOMIAL LINK=CLOGIT SOLUTION CL ddfm=bw 
		oddsratio(DIFF=LAST LABEL);
	random intercept / sub=Subj TYPE=VC;
	covtest / WALD;
run;

My questions are:

1) Is it appropriate to treat sex as a continuous IV centered on the sample, so that the effects are interpreted as when accounting for sex (and the unequal weighting in the sample), rather than dummy-coding (which results in interpreting other effects as for only men or women)?

2) Is it appropriate to treat limb of the blood alcohol curve as a continuous IV, centered on whichever limb I'm interested in? Most likely, I would be interested in interpreting the intercepts (how far participants were willing to drive) on both limbs, and would run the model twice, once when centered on each of the ascending and descending limbs.

***For questions 1 and 2, the -2LL and slopes do not seem to change with either type of coding unless I include a random slope in the models.

3) Is it appropriate to interpret intercepts in multinomial regression similar to regular multivariate regressions (i.e., likelihood of driving x distance when all other predictors are zero).

4) With cumulative logit models, are the intercepts always distinguishing between the highest ordered value and the lower categories? Or are they distinguishing between anything falling above or below a particular cutoff?

For example, if 0 miles is the highest ordered category and 10 miles is the lowest, the intercept for "3 miles" the difference between the likelihood of driving: a) 0 miles versus 3 miles or more, OR, b) 0-1 miles versus 3 miles or more?

Any recommendations on any of these questions are much appreciated!

PaigeMiller · Posted 01-03-2020 06:39 PM

@cmo5 wrote:

I'm wondering if there are any recommendations on whether to code two-level categorical variables as continuous variables or class variables (dummy coded) in a PROC GLIMMIX model with a multinomial outcome. I would prefer to use effect coding with sample-centered IVs so that the intercepts are easier to interpret, but I've found mixed information on whether it's appropriate to interpret intercepts at all in multinomial models.

I wouldn't do it. A two-level variable is a two-level variable (reading further, it is gender), don't think about treating it as continuous.

Intercepts are hard to interpret? Not if you interpret the Least Squares Means, which are extremely easy to interpret, and then you don't have hard-to-interpret intercepts and so you can ignore those intercepts. And the problem is solved.

I wrote a simple explanation comparing Least Squares Means interpretation to the model terms interpretation here: https://communities.sas.com/t5/Statistical-Procedures/Interpreting-Multivariate-Linear-Regression-wi...

--
Paige Miller

Effects coding versus dummy coding in with two-level categorical variables in PROC GLIMMIX

Re: Effects coding versus dummy coding in with two-level categorical variables in PROC GLIMMIX

Effects coding versus dummy coding in with two-level categorical variables in PROC GLIMMIX

Re: Effects coding versus dummy coding in with two-level categorical variables in PROC GLIMMIX

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!