So I created dummy variables to run a regression:
Here's how I did that:
data models (drop=weekday); set have; if weekday=1 then mon=1; else mon=0; if weekday in (2,3,4) then twt=1; else twt=0; if weekday=5 then fri=1; else fri=0; if weekday=6 then sat=1; else sat=0; if weekday=0 then sun=1; else sun=0; if Temperature =< 75 then low=1; else low=0; if Temperature > 75 and Temperature < 85 then mid=1; else mid=0; if Temperature => 85 then high=1; else high=0; if Month=6 then june=1; else june=0; if Month=7 then july=1; else july=0; if Month=8 then august=1; else august=0; run;
And here's my reg code:
proc sort data=models; by Hour; run; proc reg data=models; where Month in (6,7,8); model Load = june july august Temperature low mid high DewPoint WindSpeed CloudCover SolarRadiation mon twt fri sat sun; by Hour; run;
Now, everything works but I get this NOTE:
Note: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
Note: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.
And this thing that I don't know how to even interpret:
This is what happens with dummy variables. It is expected. You can't estimate effects of ALL of the dummy variables. Why, because if you know the values of mon, twt, fri and sat, then the value of sun is uniquely determined and has no additional value.
Now, you might be better off doing this in PROC GLM, and there are two benefits here:
Less work, more interpretable results, sounds like GLM is a win-win!
This is what happens with dummy variables. It is expected. You can't estimate effects of ALL of the dummy variables. Why, because if you know the values of mon, twt, fri and sat, then the value of sun is uniquely determined and has no additional value.
Now, you might be better off doing this in PROC GLM, and there are two benefits here:
Less work, more interpretable results, sounds like GLM is a win-win!
Plenty of examples in the PROC GLM documentation
That message arises when one (or more) of your variables can be determined by the values of one or more other variables.
Example every time June or July= 1 August=0, and every time June=0 and July=0 then August=1. August could be calculated from the values of June and july. so August isn't actually needed.
You have a similar case with the day of week and likely the temperature dummies.
I suspect that you might be better off going to Proc GLM and providing CLASS variables with appropriate formats to create the groups based on month, day of week and temperature ranges.
And if your "hour" variable represents time of day the SolarRadiation might have issues adding to the variable in hours of darkness, which change somewhat.
SAS with the regression procedures that allow CLASS variables removes the need for you to create the classes and does not over specify class variables.
Are you really sure that Monday and Friday have different effects than Tue, Wed and Thu?
Anyway formats such as
proc format; value weekday5_ 1='Mon' 2,3,4='TWT' 5='Fri' 6='Sat' 7='Sun' ; value weekday2_ 1-5='Weekday' 6-7='Weekend' ; run;
could be used to provide different groups when using a class variable and only changing the format during the proc run:
proc format; value weekday5_ 1='Mon' 2,3,4='TWT' 5='Fri' 6='Sat' 7='Sun' ; value weekday2_ 1-5='Weekday' 6-7='Weekend' ; run; proc glm data=have; class weekday; format weekday weekday5_.; <other code> run; proc glm data=have; class weekday; format weekday weekday2_.; <other code> run;
would run the same code but with the number of categories for the class variable weekday with 5 or 2.
The rules for custom formats is that you can't end in a number hence the _ in the example.
okay so I ran GLM instead and It is more useful. I used these variables:
data want; set models; if weekday=1 then week_day=1; if weekday in (2,3,4) then week_day=2; if weekday=5 then week_day=5; if weekday=6 then week_day=6; if weekday=0 then week_day=0; if Temperature =< 75 then temp=1; if Temperature > 75 and Temperature < 85 then temp=2; if Temperature => 85 then temp=3; run;
used this code:
proc sort data=want; by Hour; run; proc glm data=want; where Month in (6,7,8); class Month week_day temp; model Load = Month week_day Temperature temp DewPoint WindSpeed CloudCover SolarRadiation / solution; by Hour; run;
And got this for hour=0. Solar Radiation of course is 0 at this time. But I get 0 for sunday and medium temperature. How can I interpret this if I want a regression equation?
Intercept | -8746.485035 | B | 1890.251873 | -4.63 | <.0001 |
Month 6 | -584.895553 | B | 198.704451 | -2.94 | 0.0033 |
Month 7 | -165.533354 | B | 186.205289 | -0.89 | 0.3743 |
Month 8 | 0.000000 | B | . | . | . |
week_day 0 | -2786.546010 | B | 276.760059 | -10.07 | <.0001 |
week_day 1 | -2041.270203 | B | 276.661396 | -7.38 | <.0001 |
week_day 2 | 1677.265795 | B | 225.963521 | 7.42 | <.0001 |
week_day 5 | 1781.359362 | B | 276.279722 | 6.45 | <.0001 |
week_day 6 | 0.000000 | B | . | . | . |
Temperature | 1185.308015 | 43.838666 | 27.04 | <.0001 | |
temp 1 | -2049.847453 | B | 244.709459 | -8.38 | <.0001 |
temp 2 | 0.000000 | B | . | . | . |
DewPoint | 46.832461 | 36.470151 | 1.28 | 0.1995 | |
WindSpeed | 32.324264 | 44.988656 | 0.72 | 0.4727 | |
CloudCover | -34.170070 | 5.695353 | -6.00 | <.0001 | |
SolarRadiation | 0.000000 | B | . | . | . |
@matt23 wrote:
okay so I ran GLM instead and It is more useful. I used these variables:
data want; set models; if weekday=1 then week_day=1; if weekday in (2,3,4) then week_day=2; if weekday=5 then week_day=5; if weekday=6 then week_day=6; if weekday=0 then week_day=0; if Temperature =< 75 then temp=1; if Temperature > 75 and Temperature < 85 then temp=2; if Temperature => 85 then temp=3; run;used this code:
proc sort data=want; by Hour; run; proc glm data=want; where Month in (6,7,8); class Month week_day temp; model Load = Month week_day Temperature temp DewPoint WindSpeed CloudCover SolarRadiation / solution; by Hour; run;And got this for hour=0. Solar Radiation of course is 0 at this time. But I get 0 for sunday and medium temperature. How can I interpret this if I want a regression equation?
Parameter Estimate Standard Error t Value Pr > |t|
Intercept -8746.485035 B 1890.251873 -4.63 <.0001 Month 6 -584.895553 B 198.704451 -2.94 0.0033 Month 7 -165.533354 B 186.205289 -0.89 0.3743 Month 8 0.000000 B . . . week_day 0 -2786.546010 B 276.760059 -10.07 <.0001 week_day 1 -2041.270203 B 276.661396 -7.38 <.0001 week_day 2 1677.265795 B 225.963521 7.42 <.0001 week_day 5 1781.359362 B 276.279722 6.45 <.0001 week_day 6 0.000000 B . . . Temperature 1185.308015 43.838666 27.04 <.0001 temp 1 -2049.847453 B 244.709459 -8.38 <.0001 temp 2 0.000000 B . . . DewPoint 46.832461 36.470151 1.28 0.1995 WindSpeed 32.324264 44.988656 0.72 0.4727 CloudCover -34.170070 5.695353 -6.00 <.0001 SolarRadiation 0.000000 B . . .
If you want "interpretation", you use LSMEANS command in PROC GLM. If you want a regression equation, you use the values under Estimate (but you don't really want your own regression equation, do you? SAS can do the predictions for you, so you don't have to do it manually)
The only time I can think of where you need to write down the regression equation is to put it in a report.
If you are going to do calculations with the regression equation, do it in SAS. Do not try to take each value under estimate and use them yourself.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.