Help using Base SAS procedures

Working with Indicator Variables

Reply
Occasional Contributor
Posts: 13

Working with Indicator Variables

Consider making an indicator variable for a predictor with 3 levels. Suppose the color variable can be red, blue, or white.

select(color);

when('red')

do;

i_Red = 1;

i_Blue = 0;

end;

when('blue')

do;

i_Red = 0;

i_Blue = 1;

end;

when('white')

do;

i_Red = 0;

i_Blue = 0;

end;

If we had a fourth level, how would I generalize the above procedure? If we added the color purple, would I have to redo all the code to something along the lines of:

when('purple')

do;

i_Purple = 1;

i_Red = 0;

i_Blue = 0;

i_White = 0;

end;

And after I have successfully created an indicator variable, how do I go about calling it for a procedure? If I wanted to run the regression procedure, would it be something like:

proc reg data = data_1;

model y*color;

when color = 'white'

end;

Thanks in advance.

Respected Advisor
Posts: 2,655

Re: Working with Indicator Variables

Look at PROC GLMMOD.  It will create a dataset with indicator variables on it.  Your syntax for proc reg will not work--check the model statement.

Steve Denham

Super User
Posts: 19,851

Re: Working with Indicator Variables

Depending on the procedure SAS will do that for you with a Class Statement. Look at how it parameterizes it though, GLM/EFFECT/REF. Generally you want REF but its not the default method.


That would be my starting point, what procs are you looking at.

Coding systems for categorical variables in regression analysis

Occasional Contributor
Posts: 13

Re: Working with Indicator Variables

I was told that I could not use proc glm. I'm supposed to get familiar with how to do things without it. So I'm trying to generalize the method I learned for categorical variables with 3 levels.

I'm mainly limited to using proc reg.

Respected Advisor
Posts: 2,655

Re: Working with Indicator Variables

So this is a homework type problem?

Almost every SAS proc has access to a CLASS statement.  For those that don't, then the GLMMOD procedure was created to generate consistent coding for categorical variables.  To not use it would be equivalent (to me) to requiring that you write the code in assembler language.

Steve Denham

Occasional Contributor
Posts: 13

Re: Working with Indicator Variables

Posted in reply to SteveDenham

Correct, this is a homework question. I'm not very well-acquainted with SAS, so I'm not sure what you mean by assembler language.

Respected Advisor
Posts: 2,655

Re: Working with Indicator Variables

Assembler language is what was used back in mainframe days.  Fortran mapped commands into assembler which were then converted into the actual binary code.  Asking you to write code to do something that has already been verified to work correctly when you use the given PROC falls into that kind of request to my mind.  Why write low level code, when a higher level version is readily available?

(As you may have guessed, I'm not a real great programmer--but I can do an awful lot in SAS using various PROCs.)

Steve Denham

Occasional Contributor
Posts: 13

Re: Working with Indicator Variables

Posted in reply to SteveDenham

Ah, I see. Well my teacher explained that a lot of people use proc glm for everything, but they don't know what they're doing or what they're looking for since they don't know what it actually does unless you actually know what some of the lower-level code does.

I'm assuming for indicator variables, when you have 4 levels, you want 3 indicators. So you basically want {1, 0, 0, 0} for 3 of them, and one with all 0's?

Respected Advisor
Posts: 2,655

Re: Working with Indicator Variables

No, I see.  Your instructor is wise--I've seen a lot of GLM used inappropriately because people didn't know what they were doing (which can be corrected fairly easily and what your instructor is trying to do with this exercise) or they assumed it did something that it just cannot do.

So, look in the documentation under Shared Concepts and Topics for Levelization of Classification Variables and Parameterization of Model Effects.  In particular, take a look at REF and GLM coding.  I'm just not the right person to ask about data step coding to get there.

Steve Denham

Respected Advisor
Posts: 3,799

Re: Working with Indicator Variables

This is an example that uses PROC TRANSPOSE to create indicator variables.  You don't have to know now many levels of COLOR you have.   It then uses PROC STDIZE to poke zeros into the missing values and then fits the model with REG and includes a TEST statement to compute the COLOR main effect.  (you will need to know about the names of the indicator variables to write this statement or you could use code gen).  Then it does GLM to check.

data colors;
   length color $8;
  
do i = 1 to 40;
      color = chooseC(rantbl(
1234,.25,.25,.25),'Red','Blue','Green','Yellow');
      y = rannor(0);
      output;
     
end;
  
retain one 1;
  
run;
proc transpose data=colors out=indic(drop=_name_) prefix=i_;
   by i y;
   var one;
   id color;
   run;
proc stdize reponly missing=0 out=indic2;
   var i_:;
   run;
proc reg data=indic2;
   model y = i_:;
   color: test i_red, i_blue, i_green;
   run;
proc glm data=colors;
   class color;
   model y = color;
   run;
Respected Advisor
Posts: 2,655

Re: Working with Indicator Variables

Posted in reply to data_null__

See--that's what a real SAS programmer, not a PROC hacker like me, can come up with.

Steve Denham

Occasional Contributor
Posts: 13

Re: Working with Indicator Variables

Posted in reply to data_null__

Thanks for the help everyone.

Super User
Posts: 19,851

Re: Working with Indicator Variables

Posted in reply to SteveDenham

I'd actually disagree. It's useful to teach students how categorical variables are treated in regression, but they should learn both methods.

Do it in Proc Reg and then compare with with PROC GLM or whatever the applicable proc is.

And PROC GLM is not GLMMOD, different procs, different purposes entirely.

The GLMMOD procedure constructs the design matrix for a general linear model; it essentially constitutes the model-building front end for the GLM procedure.

Ask a Question
Discussion stats
  • 12 replies
  • 314 views
  • 2 likes
  • 4 in conversation