BookmarkSubscribeRSS Feed
cmo5
Calcite | Level 5

I'm wondering if there are any recommendations on whether to code two-level categorical variables as continuous variables or class variables (dummy coded) in a PROC GLIMMIX model with a multinomial outcome. I would prefer to use effect coding with sample-centered IVs so that the intercepts are easier to interpret, but I've found mixed information on whether it's appropriate to interpret intercepts at all in multinomial models.

 

My data have two levels, drinking observations measured across 6 timepoints (3 on ascending limb of blood alcohol curve, 3 on descending limb) nested within subjects.

 

Driving Dist = how far participants were willing to drive, 4 level DV (0 miles, 1 mile, 3 miles, 10 miles)

cSex = sex, centered on grand mean (unequal sample sizes of men and women, so mean is near, but not exactly zero)

cMale = sex, centered on males = 0 (females = 1)

cBase_Attitudes = baseline attitudes about drinking and driving, continuous IV

Limb = limb of blood alcohol curve, centered on ascending limb = 0 (descending = 1)

 

 

Here's two versions of my syntax:

 

*Effects coding.;
proc glimmix data=three method=laplace empirical=mbn noitprint noclprint ;
	where PacketN>2;
	class Subj ;
	model DriveDist (order=data)= cSex cBase_AIDattitudes Limb / 
                DIST=MULTINOMIAL LINK=CLOGIT SOLUTION CL ddfm=bw 
		oddsratio(DIFF=LAST LABEL);
	random intercept / sub=Subj TYPE=VC;
	covtest / WALD;
run;

*Dummy coding.;
proc glimmix data=three method=laplace empirical=mbn noitprint noclprint ;
	where PacketN>2;
	class Subj cMale Limb;
	model DriveDist (order=data)= cMale cBase_AIDattitudes Limb / 
                DIST=MULTINOMIAL LINK=CLOGIT SOLUTION CL ddfm=bw 
		oddsratio(DIFF=LAST LABEL);
	random intercept / sub=Subj TYPE=VC;
	covtest / WALD;
run;

 

My questions are:

     1) Is it appropriate to treat sex as a continuous IV centered on the sample, so that the effects are interpreted as when accounting for sex (and the unequal weighting in the sample), rather than dummy-coding (which results in interpreting other effects as for only men or women)?

 

     2) Is it appropriate to treat limb of the blood alcohol curve as a continuous IV, centered on whichever limb I'm interested in? Most likely, I would be interested in interpreting the intercepts (how far participants were willing to drive) on both limbs, and would run the model twice, once when centered on each of the ascending and descending limbs.

***For questions 1 and 2, the -2LL and slopes do not seem to change with either type of coding unless I include a random slope in the models.

 

     3) Is it appropriate to interpret intercepts in multinomial regression similar to regular multivariate regressions (i.e., likelihood of driving x distance when all other predictors are zero).

 

     4) With cumulative logit models, are the intercepts always distinguishing between the highest ordered value and the lower categories? Or are they distinguishing between anything falling above or below a particular cutoff?

         For example, if 0 miles is the highest ordered category and 10 miles is the lowest, the intercept for "3 miles" the difference between the likelihood of driving: a) 0 miles versus 3 miles or more, OR, b) 0-1 miles versus 3 miles or more?

 

Any recommendations on any of these questions are much appreciated!

1 REPLY 1
PaigeMiller
Diamond | Level 26

@cmo5 wrote:

I'm wondering if there are any recommendations on whether to code two-level categorical variables as continuous variables or class variables (dummy coded) in a PROC GLIMMIX model with a multinomial outcome. I would prefer to use effect coding with sample-centered IVs so that the intercepts are easier to interpret, but I've found mixed information on whether it's appropriate to interpret intercepts at all in multinomial models.


I wouldn't do it. A two-level variable is a two-level variable (reading further, it is gender), don't think about treating it as continuous.

 

Intercepts are hard to interpret? Not if you interpret the Least Squares Means, which are extremely easy to interpret, and then you don't have hard-to-interpret intercepts and so you can ignore those intercepts. And the problem is solved.

 

I wrote a simple explanation comparing Least Squares Means interpretation to the model terms interpretation here: https://communities.sas.com/t5/Statistical-Procedures/Interpreting-Multivariate-Linear-Regression-wi...

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 856 views
  • 2 likes
  • 2 in conversation