(How can I solve this problem):
I have two responses y1, y2 and 8 independent variables.
Im trying to run multiple linear regression on the data using proc reg either on y1 or y2:
And I got the following notes:
PROC IMPORT DATAFILE="/folders/myfolders/sasuser.v94/ENB2012_data.xlsx"
OUT=energy
DBMS=XLSX
REPLACE;
RUN;
PROC PRINT DATA=energy; RUN;
proc reg data=energy ;
model y1= x1 x2 x3 x4 x5 x6 x7 x8 / vif collin;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: Y1 Y1
Number of Observations Read | 1296 |
Number of Observations Used | 768 |
Number of Observations with Missing Values | 528 |
Analysis of Variance | |||||
Source | DF | Sum of | Mean | F Value | Pr > F |
Model | 7 | 71546 | 10221 | 1187.06 | <.0001 |
Error | 760 | 6543.77041 | 8.61022 |
|
|
Corrected Total | 767 | 78090 |
|
|
|
Root MSE | 2.93432 | R-Square | 0.9162 |
Dependent Mean | 22.30720 | Adj R-Sq | 0.9154 |
Coeff Var | 13.15413 |
|
|
Note:Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
Note:The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.
X4 = | 2.97E-8 * Intercept - 1.63E-8 * X1 + 0.5 * X2 - 0.5 * X3 + 342E-12 * X5 |
Parameter Estimates | |||||||
Variable | Label | DF | Parameter | Standard | t Value | Pr > |t| | Variance |
Intercept | Intercept | B | 84.01342 | 19.03361 | 4.41 | <.0001 | 0 |
X1 | X1 | B | -64.77343 | 10.28945 | -6.30 | <.0001 | 105.52405 |
X2 | X2 | B | -0.08729 | 0.01708 | -5.11 | <.0001 | 201.53113 |
X3 | X3 | B | 0.06081 | 0.00665 | 9.15 | <.0001 | 7.49298 |
X4 | X4 | 0 | 0 | . | . | . | . |
X5 | X5 | B | 4.16995 | 0.33799 | 12.34 | <.0001 | 31.20547 |
X6 | X6 | 1 | -0.02333 | 0.09470 | -0.25 | 0.8055 | 1.00000 |
X7 | X7 | 1 | 19.93274 | 0.81399 | 24.49 | <.0001 | 1.04751 |
X8 | X8 | 1 | 0.20378 | 0.06992 | 2.91 | 0.0037 | 1.0475 |
Also I tried to make dummy variables, since I have 2 categorical variables as follow and still the same:
DATA energy1;
SET energy;
IF (x6 = 2) THEN d61 = 1; ELSE d61 = 0;
IF (x6 = 3) THEN d62 = 1; ELSE d62 = 0;
IF (x6 = 4) THEN d63 = 1; ELSE d63 = 0;
IF (x6 = 5) THEN d64 = 1; ELSE d64 = 0;
IF (x8 = 0) THEN d81 = 1; ELSE d81 = 0;
IF (x8 = 1) THEN d82 = 1; ELSE d82 = 0;
IF (x8 = 2) THEN d83 = 1; ELSE d83 = 0;
IF (x8 = 3) THEN d84 = 1; ELSE d84 = 0;
IF (x8 = 4) THEN d85 = 1; ELSE d85 = 0;
IF (x8 = 5) THEN d86 = 1; ELSE d86 = 0;
run;
see the attached data or ( https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx )
@alot4315 wrote:
(How can I solve this problem):
proc reg data=energy ;
model y1= x1 x2 x3 x4 x5 x6 x7 x8 / vif collin;
run;
Analysis of Variance
Source
DF
Sum of
SquaresMean
SquareF Value
Pr > F
Model
7
71546
10221
1187.06
<.0001
Error
760
6543.77041
8.61022
Corrected Total
767
78090
Root MSE
2.93432
R-Square
0.9162
Dependent Mean
22.30720
Adj R-Sq
0.9154
Coeff Var
13.15413
Note:Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.
Note:The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.
X4 =
2.97E-8 * Intercept - 1.63E-8 * X1 + 0.5 * X2 - 0.5 * X3 + 342E-12 * X5
You can't have x4 and the combination of x1 x2 x3 x5 in the model at the same time. Why? Because x4 does not provide additional information, if you know x1 x2 x3 x5, then you know x4 exactly. So the solution is to remove x4 from the model.
This feature of the data results from its artificial nature.
http://people.maths.ox.ac.uk/tsanas/Preprints/ENB2012.pdf
Multiple regression is not a promissing analysis tool for such data. You should try something simpler (conceptually) first, such as regression trees (proc hpsplit).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.