## Modl not a full rank, dummy variables

Solved
Frequent Contributor
Posts: 89

# Modl not a full rank, dummy variables

So I created dummy variables to run a regression:

Here's how I did that:

```data models (drop=weekday);
set have;
if weekday=1 then mon=1;
else mon=0;
if weekday in (2,3,4) then twt=1;
else twt=0;
if weekday=5 then fri=1;
else fri=0;
if weekday=6 then sat=1;
else sat=0;
if weekday=0 then sun=1;
else sun=0;
if Temperature =< 75 then low=1;
else low=0;
if Temperature > 75 and Temperature < 85 then mid=1;
else mid=0;
if Temperature => 85 then high=1;
else high=0;
if Month=6 then june=1;
else june=0;
if Month=7 then july=1;
else july=0;
if Month=8 then august=1;
else august=0;
run;```

And here's my reg code:

```proc sort data=models;
by Hour;
run;

proc reg data=models;
where Month in (6,7,8);
model Load = june july august Temperature low mid high DewPoint
WindSpeed CloudCover SolarRadiation mon twt fri sat sun;
by Hour;
run;```

Now, everything works but I get this NOTE:

Note: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

Note: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

And this thing that I don't know how to even interpret:

Accepted Solutions
Solution
3 weeks ago
Posts: 3,271

## Re: Modl not a full rank, dummy variables

[ Edited ]

This is what happens with dummy variables. It is expected. You can't estimate effects of ALL of the dummy variables. Why, because if you know the values of mon, twt, fri and sat, then the value of sun is uniquely determined and has no additional value.

Now, you might be better off doing this in PROC GLM, and there are two benefits here:

1. You don't have to create the dummy variables yourself. PROC GLM can create the dummy variables for you, behind the scenes, and utilize them properly without you having the create the dummy variables first. Temperature would be better off handled as a continuous variable. Months can be handled via dummy variables (unless the months span different years, that would be a different problem). Your weekday with Tuesday Wednesday and Thursday combined into a single dummy variable would have to be handled by creating dummy variables; or let GLM create the dummy variables and then see if the effect of Tuesday, Wednesday and Thursday are not statistically different.
2. GLM will produce LSMEANS, which is the numbers you really really really really want to look at instead of the regression coefficients as you are doing now. And by looking at the LSMEANS instead of the regression coefficients, the issue about not full rank giving 0 coefficients also goes away.

Less work, more interpretable results, sounds like GLM is a win-win!

--
Paige Miller

All Replies
Solution
3 weeks ago
Posts: 3,271

## Re: Modl not a full rank, dummy variables

[ Edited ]

This is what happens with dummy variables. It is expected. You can't estimate effects of ALL of the dummy variables. Why, because if you know the values of mon, twt, fri and sat, then the value of sun is uniquely determined and has no additional value.

Now, you might be better off doing this in PROC GLM, and there are two benefits here:

1. You don't have to create the dummy variables yourself. PROC GLM can create the dummy variables for you, behind the scenes, and utilize them properly without you having the create the dummy variables first. Temperature would be better off handled as a continuous variable. Months can be handled via dummy variables (unless the months span different years, that would be a different problem). Your weekday with Tuesday Wednesday and Thursday combined into a single dummy variable would have to be handled by creating dummy variables; or let GLM create the dummy variables and then see if the effect of Tuesday, Wednesday and Thursday are not statistically different.
2. GLM will produce LSMEANS, which is the numbers you really really really really want to look at instead of the regression coefficients as you are doing now. And by looking at the LSMEANS instead of the regression coefficients, the issue about not full rank giving 0 coefficients also goes away.

Less work, more interpretable results, sounds like GLM is a win-win!

--
Paige Miller
Frequent Contributor
Posts: 89

## Re: Modl not a full rank, dummy variables

Thank you, this is really helpful. Do you have an example of the GLM code? I need to find regression models based on these variables and I actually never used GLM for regression and from what you are saying it would be the best option
Posts: 3,271

## Re: Modl not a full rank, dummy variables

Plenty of examples in the PROC GLM documentation

http://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_glm...

--
Paige Miller
Super User
Posts: 13,941

## Re: Modl not a full rank, dummy variables

That message arises when one (or more) of your variables can be determined by the values of one or more other variables.

Example every time June or July= 1 August=0, and every time June=0 and July=0 then August=1. August could be calculated from the values of June and july. so August isn't actually needed.

You have a similar case with the day of week and likely the temperature dummies.

I suspect that you might be better off going to Proc GLM and providing CLASS variables with appropriate formats to create the groups based on month, day of week and temperature ranges.

And if your "hour" variable represents time of day the SolarRadiation might have issues adding to the variable in hours of darkness, which change somewhat.

SAS with the regression procedures that allow CLASS variables removes the need for you to create the classes and does not over specify class variables.

Are you really sure that Monday and Friday have different effects than Tue, Wed and Thu?

Anyway formats such as

```proc format;
value weekday5_
1='Mon'
2,3,4='TWT'
5='Fri'
6='Sat'
7='Sun'
;
value weekday2_
1-5='Weekday'
6-7='Weekend'
;
run;```

could be used to provide different groups when using a class variable and only changing the format during the proc run:

```proc format;
value weekday5_
1='Mon'
2,3,4='TWT'
5='Fri'
6='Sat'
7='Sun'
;
value weekday2_
1-5='Weekday'
6-7='Weekend'
;
run;

proc glm data=have;
class weekday;
format weekday weekday5_.;
<other code>
run;

proc glm data=have;
class weekday;
format weekday weekday2_.;
<other code>
run;```

would run the same code but with the number of categories for the class variable weekday with 5 or 2.

The rules for custom formats is that you can't end in a number hence the _ in the example.

Frequent Contributor
Posts: 89

## Re: Modl not a full rank, dummy variables

okay so I ran GLM instead and It is more useful. I used these variables:

```data want;
set models;
if weekday=1 then week_day=1;
if weekday in (2,3,4) then week_day=2;
if weekday=5 then week_day=5;
if weekday=6 then week_day=6;
if weekday=0 then week_day=0;
if Temperature =< 75 then temp=1;
if Temperature > 75 and Temperature < 85 then temp=2;
if Temperature => 85 then temp=3;
run;```

used this code:

```proc sort data=want;
by Hour;
run;

proc glm data=want;
where Month in (6,7,8);
class Month week_day temp;
model Load = Month week_day Temperature temp DewPoint WindSpeed CloudCover SolarRadiation / solution;
by Hour;
run;```

And got this for hour=0. Solar Radiation of course is 0 at this time. But I get 0 for sunday and medium temperature. How can I interpret this if I want a regression equation?

Parameter Estimate   Standard Error t Value Pr > |t|
 Intercept -8746.49 B 1890.251873 -4.63 <.0001 Month 6 -584.896 B 198.704451 -2.94 0.0033 Month 7 -165.533 B 186.205289 -0.89 0.3743 Month 8 0 B . . . week_day 0 -2786.55 B 276.760059 -10.07 <.0001 week_day 1 -2041.27 B 276.661396 -7.38 <.0001 week_day 2 1677.27 B 225.963521 7.42 <.0001 week_day 5 1781.36 B 276.279722 6.45 <.0001 week_day 6 0 B . . . Temperature 1185.31 43.838666 27.04 <.0001 temp 1 -2049.85 B 244.709459 -8.38 <.0001 temp 2 0 B . . . DewPoint 46.8325 36.470151 1.28 0.1995 WindSpeed 32.3243 44.988656 0.72 0.4727 CloudCover -34.1701 5.695353 -6.00 <.0001 SolarRadiation 0 B . . .
Posts: 3,271

## Re: Modl not a full rank, dummy variables

@matt23 wrote:

okay so I ran GLM instead and It is more useful. I used these variables:

```data want;
set models;
if weekday=1 then week_day=1;
if weekday in (2,3,4) then week_day=2;
if weekday=5 then week_day=5;
if weekday=6 then week_day=6;
if weekday=0 then week_day=0;
if Temperature =< 75 then temp=1;
if Temperature > 75 and Temperature < 85 then temp=2;
if Temperature => 85 then temp=3;
run;```

used this code:

```proc sort data=want;
by Hour;
run;

proc glm data=want;
where Month in (6,7,8);
class Month week_day temp;
model Load = Month week_day Temperature temp DewPoint WindSpeed CloudCover SolarRadiation / solution;
by Hour;
run;```

And got this for hour=0. Solar Radiation of course is 0 at this time. But I get 0 for sunday and medium temperature. How can I interpret this if I want a regression equation?

Parameter Estimate   Standard Error t Value Pr > |t|
 Intercept -8746.49 B 1890.251873 -4.63 <.0001 Month 6 -584.896 B 198.704451 -2.94 0.0033 Month 7 -165.533 B 186.205289 -0.89 0.3743 Month 8 0 B . . . week_day 0 -2786.55 B 276.760059 -10.07 <.0001 week_day 1 -2041.27 B 276.661396 -7.38 <.0001 week_day 2 1677.27 B 225.963521 7.42 <.0001 week_day 5 1781.36 B 276.279722 6.45 <.0001 week_day 6 0 B . . . Temperature 1185.31 43.838666 27.04 <.0001 temp 1 -2049.85 B 244.709459 -8.38 <.0001 temp 2 0 B . . . DewPoint 46.8325 36.470151 1.28 0.1995 WindSpeed 32.3243 44.988656 0.72 0.4727 CloudCover -34.1701 5.695353 -6.00 <.0001 SolarRadiation 0 B . . .

If you want "interpretation", you use LSMEANS command in PROC GLM. If you want a regression equation, you use the values under Estimate (but you don't really want your own regression equation, do you? SAS can do the predictions for you, so you don't have to do it manually)

--
Paige Miller
Frequent Contributor
Posts: 89

## Re: Modl not a full rank, dummy variables

what do you mean by that? Is there a better way of obtaining a regression equation than from 'Parameter Estimate'?
Posts: 3,271

## Re: Modl not a full rank, dummy variables

The only time I can think of where you need to write down the regression equation is to put it in a report.

If you are going to do calculations with the regression equation, do it in SAS. Do not try to take each value under estimate and use them yourself.

--
Paige Miller
Frequent Contributor
Posts: 89

## Re: Modl not a full rank, dummy variables

Oh I think I get it. So if it is a Sunday it puts 0* for all days of the week right? So it's just intercept + 0*all days + the rest of the equation?
☑ This topic is solved.