Hi,
when regressing data that contains dummy variables we omit one of the dummies and then the coefficients that SAS outputs for the dummies is the difference of the effect on the dependent variable of that given dummy less the ommitted dummy, this is straighforward.
But what should be done when there are 2 tyes of dummies: suppose that there are dummies A1-A4 and B1-B4. The A and B categories are independent of each other, so I wan to omit A4 in order to study the effect of A1-A3 compared to A4, and to ommit B4 in order to study the effect of B1-B3 compared to B4. But when I ommit A4 and B4, how does SAS know (or how is it possible to make it know) that A4 is related only to A1-A3 and B4 only to B1-B3.
Just a little more ilustration, suppose I have dummies New York, Chicago, Los Angeles and dummies Summer, Winter, Fall, Spring - If I ommit Spring and Los Angeles, it is nonsensical to compare say Chicago with Spring and Fall with Los Aangeles
Thanks!
You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.
I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.
What you should do depends on what you would like to do. You only have to make sure that you leave out either the intercept or one dummy variable. One approach would be to keep the intercept, define A1=1 as base scenario. Then you would have: Intercept=1; leave out A1=spring(?); A2=1 if summer, 0 otherwise;A3=1 if autumn, 0 otherwise; A4=1 if winter, 0 otherwise ; B1=1 if NY, 0 otherwise, B2=1 if Chicago, 0 otherwise; ..
Your model for proc reg (y=dep. var) would be: y = A2--B4 .. ;
You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.
I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.
Hi Rick,
Glad to know that you have another blog! (I subscribed to it as well).
Just a small question,
In the example the you use, suppose that I also want to include the continous variables height and weight (in addition to the other 2 categorical dummy types). In such a case would I just have to add these variables into the model in the following way:
/* same analysis by using the CLASS statement */
proc glm data=Patients;
class sex BP_Status; /* generates dummy variables internally */
model Cholesterol = Sex BP_Status HEIGHT WEIGHT / solution;
ods select ParameterEstimates;
quit;
Thanks!
That is correct. Just add the continuous variables to the MODEL statement.
I am confused by your statement about "another blog." I only write one blog, and it is located at http://blogs.sas.com/content/iml
Silly me, I didn't realize it was the old DO LOOP blog but with a new appearance!
And thatnks for the answer!
Hi Rick,
I posted a new question https://communities.sas.com/t5/SAS-Statistical-Procedures/Tip-Fixed-vs-Random-Effects-in-Panel-Data/...
it looks very similar to the question that you answered here, but could you please take a look at it, since I am not sure.
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.