Consider making an indicator variable for a predictor with 3 levels. Suppose the color variable can be red, blue, or white.
select(color);
when('red')
do;
i_Red = 1;
i_Blue = 0;
end;
when('blue')
do;
i_Red = 0;
i_Blue = 1;
end;
when('white')
do;
i_Red = 0;
i_Blue = 0;
end;
If we had a fourth level, how would I generalize the above procedure? If we added the color purple, would I have to redo all the code to something along the lines of:
when('purple')
do;
i_Purple = 1;
i_Red = 0;
i_Blue = 0;
i_White = 0;
end;
And after I have successfully created an indicator variable, how do I go about calling it for a procedure? If I wanted to run the regression procedure, would it be something like:
proc reg data = data_1;
model y*color;
when color = 'white'
end;
Thanks in advance.
Look at PROC GLMMOD. It will create a dataset with indicator variables on it. Your syntax for proc reg will not work--check the model statement.
Steve Denham
Depending on the procedure SAS will do that for you with a Class Statement. Look at how it parameterizes it though, GLM/EFFECT/REF. Generally you want REF but its not the default method.
That would be my starting point, what procs are you looking at.
Coding systems for categorical variables in regression analysis
I was told that I could not use proc glm. I'm supposed to get familiar with how to do things without it. So I'm trying to generalize the method I learned for categorical variables with 3 levels.
I'm mainly limited to using proc reg.
So this is a homework type problem?
Almost every SAS proc has access to a CLASS statement. For those that don't, then the GLMMOD procedure was created to generate consistent coding for categorical variables. To not use it would be equivalent (to me) to requiring that you write the code in assembler language.
Steve Denham
Correct, this is a homework question. I'm not very well-acquainted with SAS, so I'm not sure what you mean by assembler language.
Assembler language is what was used back in mainframe days. Fortran mapped commands into assembler which were then converted into the actual binary code. Asking you to write code to do something that has already been verified to work correctly when you use the given PROC falls into that kind of request to my mind. Why write low level code, when a higher level version is readily available?
(As you may have guessed, I'm not a real great programmer--but I can do an awful lot in SAS using various PROCs.)
Steve Denham
Ah, I see. Well my teacher explained that a lot of people use proc glm for everything, but they don't know what they're doing or what they're looking for since they don't know what it actually does unless you actually know what some of the lower-level code does.
I'm assuming for indicator variables, when you have 4 levels, you want 3 indicators. So you basically want {1, 0, 0, 0} for 3 of them, and one with all 0's?
No, I see. Your instructor is wise--I've seen a lot of GLM used inappropriately because people didn't know what they were doing (which can be corrected fairly easily and what your instructor is trying to do with this exercise) or they assumed it did something that it just cannot do.
So, look in the documentation under Shared Concepts and Topics for Levelization of Classification Variables and Parameterization of Model Effects. In particular, take a look at REF and GLM coding. I'm just not the right person to ask about data step coding to get there.
Steve Denham
This is an example that uses PROC TRANSPOSE to create indicator variables. You don't have to know now many levels of COLOR you have. It then uses PROC STDIZE to poke zeros into the missing values and then fits the model with REG and includes a TEST statement to compute the COLOR main effect. (you will need to know about the names of the indicator variables to write this statement or you could use code gen). Then it does GLM to check.
See--that's what a real SAS programmer, not a PROC hacker like me, can come up with.
Steve Denham
Thanks for the help everyone.
I'd actually disagree. It's useful to teach students how categorical variables are treated in regression, but they should learn both methods.
Do it in Proc Reg and then compare with with PROC GLM or whatever the applicable proc is.
And PROC GLM is not GLMMOD, different procs, different purposes entirely.
The GLMMOD procedure constructs the design matrix for a general linear model; it essentially constitutes the model-building front end for the GLM procedure.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.