Hi Everyone,
I was wondering how to create dummy variables with "white" being the baseline. I have 5 categories (American Indian, White, Black, Pacific Islanders, Asian).
The variable race is coded (0=Black, 1=White,2=American Indian,3=Asian,4=Native Hawaiian).
My code is:
if race=0 then Black=1;
else Black=0;
if race=2 then AmericanIndian=1;
else AmericanIndian=0;
if race=3 then Asian=1;
else Asian=0;
if race=4 then Pacific_Islander=1;
else Pacific_Islander=0;
where White is currently the baseline. However, the variable race have missing data and I'm assuming those missing data are given a value of zero too.
For a good article on using a SAS procedure, see this topic from @Rick_SAS :
The best way to generate dummy variables in SAS
There are procedures like TRANSREG and GLMMOD that will code for you.
If you want to code yourself, recognize that the result of a Boolean (logical) expression is 1 if true and 0 if false. So each IF/ELSE pair could be replaced by a simple assignment statement.
Black = (Race = 0);
The parens are not required, but you might want them for clarity.
For a good article on using a SAS procedure, see this topic from @Rick_SAS :
The best way to generate dummy variables in SAS
I think you might need an "unknown" category in that case:
Unknown = missing(race);
data abc;
set abc;
Black = (race=0)*1;
White = (race=1)*1;
AmericanIndian = (race=2)*1;
Asian = (race=3)*1;
NativeHawaiian = (race=4)*1;
run;
Why do you need dummy variables at all? Almost every SAS modeling procedure creates the dummy variables for you behind the scenes so you don't have to, avoiding all of these pitfalls and potential errors. And you can specify which level is the reference (or baseline) level. This is one of the great advantages of using SAS for modeling.
PROC TRANSREG and also be useful here.
/*0=Black, 1=White,2=American Indian,3=Asian,4=Native Hawaiian). */
data race;
do race = 0,.,1 to 4;
output;
end;
run;
proc transreg data=race;
model class(race/zero='1');
id race;
output out=design(drop=intercept) design;
run;
Or perhaps this one.
/*0=Black, 1=White,2=American Indian,3=Asian,4=Native Hawaiian). */
data race;
do race = 0,.,1 to 4;
output;
end;
run;
proc transreg data=race;
model class(race/dev zero='1');
id race;
output out=design(drop=intercept) design;
run;
white = (missing(race) or (race=1))*1;
@subhashmantha wrote:
white = (missing(race) or (race=1))*1;
Or don't even try to create the dummy variables yourself, SAS can create them for you from your data by using the CLASS statement in a modeling procedure, so you don't have to create the dummy variables.
@PaigeMiller wrote:
@subhashmantha wrote:
white = (missing(race) or (race=1))*1;
Or don't even try to create the dummy variables yourself, SAS can create them for you from your data by using the CLASS statement in a modeling procedure, so you don't have to create the dummy variables.
Wouldn't a missing value for a class variable like Race either remove the observation or with the MISSING option treat it as a different level than Race=1? Perhaps this special case of the missing value could be use of the MISSING option plus a custom format to have missing and 1 treated as a single class?
Custom format or custom informat, and then stop writing your own DUMMY variables and use SAS PROCs to compute the dummy variables behind the scenes.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.