turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Working with Indicator Variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 12:21 PM

Consider making an indicator variable for a predictor with 3 levels. Suppose the color variable can be red, blue, or white.

select(color);

when('red')

do;

i_Red = 1;

i_Blue = 0;

end;

when('blue')

do;

i_Red = 0;

i_Blue = 1;

end;

when('white')

do;

i_Red = 0;

i_Blue = 0;

end;

If we had a fourth level, how would I generalize the above procedure? If we added the color purple, would I have to redo all the code to something along the lines of:

when('purple')

do;

i_Purple = 1;

i_Red = 0;

i_Blue = 0;

i_White = 0;

end;

And after I have successfully created an indicator variable, how do I go about calling it for a procedure? If I wanted to run the regression procedure, would it be something like:

proc reg data = data_1;

model y*color;

when color = 'white'

end;

Thanks in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 12:53 PM

Look at PROC GLMMOD. It will create a dataset with indicator variables on it. Your syntax for proc reg will not work--check the model statement.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 12:56 PM

Depending on the procedure SAS will do that for you with a Class Statement. Look at how it parameterizes it though, GLM/EFFECT/REF. Generally you want REF but its not the default method.

That would be my starting point, what procs are you looking at.

Coding systems for categorical variables in regression analysis

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:00 PM

I was told that I could not use proc glm. I'm supposed to get familiar with how to do things without it. So I'm trying to generalize the method I learned for categorical variables with 3 levels.

I'm mainly limited to using proc reg.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:07 PM

So this is a homework type problem?

Almost every SAS proc has access to a CLASS statement. For those that don't, then the GLMMOD procedure was created to generate consistent coding for categorical variables. To not use it would be equivalent (to me) to requiring that you write the code in assembler language.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:12 PM

Correct, this is a homework question. I'm not very well-acquainted with SAS, so I'm not sure what you mean by assembler language.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:18 PM

Assembler language is what was used back in mainframe days. Fortran mapped commands into assembler which were then converted into the actual binary code. Asking you to write code to do something that has already been verified to work correctly when you use the given PROC falls into that kind of request to my mind. Why write low level code, when a higher level version is readily available?

(As you may have guessed, I'm not a real great programmer--but I can do an awful lot in SAS using various PROCs.)

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:26 PM

Ah, I see. Well my teacher explained that a lot of people use proc glm for everything, but they don't know what they're doing or what they're looking for since they don't know what it actually does unless you actually know what some of the lower-level code does.

I'm assuming for indicator variables, when you have 4 levels, you want 3 indicators. So you basically want {1, 0, 0, 0} for 3 of them, and one with all 0's?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:55 PM

No, I see. Your instructor is wise--I've seen a lot of GLM used inappropriately because people didn't know what they were doing (which can be corrected fairly easily and what your instructor is trying to do with this exercise) or they assumed it did something that it just cannot do.

So, look in the documentation under Shared Concepts and Topics for Levelization of Classification Variables and Parameterization of Model Effects. In particular, take a look at REF and GLM coding. I'm just not the right person to ask about data step coding to get there.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:59 PM

This is an example that uses PROC TRANSPOSE to create indicator variables. You don't have to know now many levels of COLOR you have. It then uses PROC STDIZE to poke zeros into the missing values and then fits the model with REG and includes a TEST statement to compute the COLOR main effect. (you will need to know about the names of the indicator variables to write this statement or you could use code gen). Then it does GLM to check.

length color $

do i =

color = chooseC(rantbl(

y = rannor(

output;

end;

retain one

by i y;

var one;

id color;

var i_:;

model y = i_:;

color: test i_red, i_blue, i_green;

class color;

model y = color;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 02:21 PM

See--that's what a real SAS programmer, not a PROC hacker like me, can come up with.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 02:42 PM

Thanks for the help everyone.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-13-2013 01:25 PM

I'd actually disagree. It's useful to teach students how categorical variables are treated in regression, but they should learn both methods.

Do it in Proc Reg and then compare with with PROC GLM or whatever the applicable proc is.

And PROC GLM is not GLMMOD, different procs, different purposes entirely.

**The GLMMOD procedure constructs the design matrix for a general linear model; it essentially constitutes the model-building front end for the GLM procedure.**