I have data containing counts across the following three variables. Possible values are noted below (3 potential values each).
I am trying to create a logistic regression model to fit this data in order to estimate the odds ratio comparing the "COLLEGE" and "LESS" categories of my EDUCATION_LEVEL variable, adjusting for variation in region.
Honestly any sort of statistical test indicating the difference between these two categories could be useful. I just can't seem to get it to do anything useful for me. I feel like I've been messing with this for way too long and am thinking of scrapping the whole thing
When I run this code, all I get is odds ratios comparing all the specific permutations of all three variables but I can't figure out how to write a contrast statement to give me a ratio that's pooled across the regions.
PROC LOGISTIC DATA=WORK.EDUC ORDER=data;
FREQ COUNT;
CLASS EDUCATION_LEVEL (ref='LESS') REGION (ref='WEST') / param=reference order=FORMATTED;
MODEL AGREEMENT = EDUCATION_LEVEL REGION EDUCATION_LEVEL*REGION / link=clogit
SCALE=NONE AGGREGATE ;
FORMAT EDUCATION_LEVEL EDUCATION_LEVEL. REGION REGION.;
ODDSRATIO EDUCATION_LEVEL / CL=BOTH DIFF=REF ;
title "main effects partial proportional odds model" ;
CONTRAST 'COLLEGE vs LESS' EDUCATION_LEVEL 2/ ESTIMATE=exp ;
RUN;
If you are on version 9.22 or STAT12.1, take a look at the LSMESTIMATE statement. By coding up the specifics for your class levels properly, and using the EXP option, you should be able to get the odds ratios you want. Let us know if it works (because I'm kind of spitballing on this).
Steve Denham
sorry couldn't get that to work at all. I don't really know how to use that statement though.
Try the following:
PROC LOGISTIC DATA=WORK.EDUC ORDER=data;
FREQ COUNT;
CLASS EDUCATION_LEVEL (ref='LESS') REGION (ref='WEST') / param=reference order=FORMATTED;
MODEL AGREEMENT = EDUCATION_LEVEL REGION EDUCATION_LEVEL*REGION / link=clogit
SCALE=NONE AGGREGATE ;
FORMAT EDUCATION_LEVEL EDUCATION_LEVEL. REGION REGION.;
LSMEANS EDUCATION_LEVEL*REGION/E;
RUN:
This should give the means on the cumulative logit scale, and the /e option will give the coefficients, in order, for each of these means. You could then consolidate them as you want to get the comparisons. An example might be:
LSMESTIMATE education_level*region 'college vs less, averaged over regions' <INSERT APPROPRIATE VALUES HERE>;
where the appropriate values would look something like 1 0 -1 1 0 -1 1 0 -1 (divisor=3)/exp. I put this in a separate place, because I am not sure how everything will sort without running code. It just looks like there are nine estimated values, and this would average over regions. Order is critical however, which is why I added the /e option to the lsmeans statement.
Steve Denham
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.