Hi, I'm trying to change the reference level for categorical variable in proc surveyreg. Let's say I have the following code:
proc surveyreg data = temp;
weight weightvar;
strata stratum;
cluster psu;
class race;
model income = race /solution;
run;
Let's say that class is a four-level categorical variable with race = 1 being white, race = 2 being black, race = 3 being Asian, race = 4 being Hispanic. It seems like proc surveyreg will use the highest value of race (race = 4, Hispanic) as the default baseline, but I'd like to set the default to be another level (say, race = 2 or black). How do I do this without recoding my race variable? [I could do a descending sort of race and include the option "order = data" on the proc surveyreg line, but that would only help me if I wanted to make race = 1 my baseline].
The PROC SURVEYREG documentation talks about using formatted value and using the ORDER= options on the SURVEYREG statement.
You can do this two ways:
- change the value in the dataset itself, or
- use proc format tricks. e.g., your data is rece = '1','2','3','4', '4' would be used by default as you noted. Suppose you want to use '3' as reference class,
proc format;
value $race
'1' = 'A White'
'2' = 'B Black'
'3' = 'D Asian'
'4' = 'C Hispanic'
;
run;
proc surveyreg order=formatted data=temp;
class race; format race $race.;
...
'3' would be used as reference because its formatted value is sorted the highest.
PROC SURVEYLOGISTIC, on the other hand, allows you explicit control over the reference class. Here's a link to a paper showing examples http://www2.sas.com/proceedings/sugi31/140-31.pdf. This is probably not supported by SURVEYREG though (I can't test at the moment).
PROC SURVEYLOGISTIC DATA=ALL(WHERE=(20<=RIDAGEYR)) ;
CLUSTER SDMVPSU; STRATA SDMVSTRA;
CLASS SEX(ref='M') AGE1(ref=FIRST) RACE(ref='Non-Hispanic White');
WEIGHT WTMEC4YR;
MODEL HIGHTC(EVENT='1') = SEX AGE1 RACE ;
RUN;
The PROC SURVEYREG documentation talks about using formatted value and using the ORDER= options on the SURVEYREG statement.
You can do this two ways:
- change the value in the dataset itself, or
- use proc format tricks. e.g., your data is rece = '1','2','3','4', '4' would be used by default as you noted. Suppose you want to use '3' as reference class,
proc format;
value $race
'1' = 'A White'
'2' = 'B Black'
'3' = 'D Asian'
'4' = 'C Hispanic'
;
run;
proc surveyreg order=formatted data=temp;
class race; format race $race.;
...
'3' would be used as reference because its formatted value is sorted the highest.
PROC SURVEYLOGISTIC, on the other hand, allows you explicit control over the reference class. Here's a link to a paper showing examples http://www2.sas.com/proceedings/sugi31/140-31.pdf. This is probably not supported by SURVEYREG though (I can't test at the moment).
PROC SURVEYLOGISTIC DATA=ALL(WHERE=(20<=RIDAGEYR)) ;
CLUSTER SDMVPSU; STRATA SDMVSTRA;
CLASS SEX(ref='M') AGE1(ref=FIRST) RACE(ref='Non-Hispanic White');
WEIGHT WTMEC4YR;
MODEL HIGHTC(EVENT='1') = SEX AGE1 RACE ;
RUN;
Thanks so much for the tip on playing with the formats - that did the trick.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.