BookmarkSubscribeRSS Feed
alot4315
Calcite | Level 5

(How can I solve this problem):

 

I have two responses y1, y2 and 8 independent variables.

Im trying to run multiple linear regression on the data  using proc reg either on y1 or y2:

 

And I got the following notes:

 

PROC IMPORT DATAFILE="/folders/myfolders/sasuser.v94/ENB2012_data.xlsx"

                   OUT=energy

                   DBMS=XLSX

                   REPLACE;

RUN;

 

 

PROC PRINT DATA=energy; RUN;

 

proc reg data=energy ;

model y1= x1 x2 x3 x4 x5 x6 x7 x8 /  vif collin;

run;

 

 

 

 

 

 

The REG Procedure

Model: MODEL1

Dependent Variable: Y1 Y1

Number of Observations Read

1296

Number of Observations Used

768

Number of Observations with Missing Values

528

 

Analysis of Variance

Source

DF

Sum of
Squares

Mean
Square

F Value

Pr > F

Model

7

71546

10221

1187.06

<.0001

Error

760

6543.77041

8.61022

 

 

Corrected Total

767

78090

 

 

 

 

 

Root MSE

2.93432

R-Square

0.9162

Dependent Mean

22.30720

Adj R-Sq

0.9154

Coeff Var

13.15413

 

 

 

 

Note:Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

Note:The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

 

X4 =

2.97E-8 * Intercept - 1.63E-8 * X1 + 0.5 * X2 - 0.5 * X3 + 342E-12 * X5

 

Parameter Estimates

Variable

Label

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

Variance
Inflation

Intercept

Intercept

B

84.01342

19.03361

4.41

<.0001

0

X1

X1

B

-64.77343

10.28945

-6.30

<.0001

105.52405

X2

X2

B

-0.08729

0.01708

-5.11

<.0001

201.53113

X3

X3

B

0.06081

0.00665

9.15

<.0001

7.49298

X4

X4

0

0

.

.

.

.

X5

X5

B

4.16995

0.33799

12.34

<.0001

31.20547

X6

X6

1

-0.02333

0.09470

-0.25

0.8055

1.00000

X7

X7

1

19.93274

0.81399

24.49

<.0001

1.04751

X8

X8

1

0.20378

0.06992

2.91

0.0037

1.0475

 

Also I tried to make dummy variables, since I have 2 categorical variables as follow and still the same:

 

DATA energy1;
SET energy;
IF (x6 = 2) THEN d61 = 1; ELSE d61 = 0;
IF (x6 = 3) THEN d62 = 1; ELSE d62 = 0;
IF (x6 = 4) THEN d63 = 1; ELSE d63 = 0;
IF (x6 = 5) THEN d64 = 1; ELSE d64 = 0;


IF (x8 = 0) THEN d81 = 1; ELSE d81 = 0;
IF (x8 = 1) THEN d82 = 1; ELSE d82 = 0;
IF (x8 = 2) THEN d83 = 1; ELSE d83 = 0;
IF (x8 = 3) THEN d84 = 1; ELSE d84 = 0;
IF (x8 = 4) THEN d85 = 1; ELSE d85 = 0;
IF (x8 = 5) THEN d86 = 1; ELSE d86 = 0;
run;

 

 

see the attached data or ( https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx )

2 REPLIES 2
PaigeMiller
Diamond | Level 26

@alot4315 wrote:

(How can I solve this problem):

 

proc reg data=energy ;

model y1= x1 x2 x3 x4 x5 x6 x7 x8 /  vif collin;

run;

 

 

Analysis of Variance

Source

DF

Sum of
Squares

Mean
Square

F Value

Pr > F

Model

7

71546

10221

1187.06

<.0001

Error

760

6543.77041

8.61022

 

 

Corrected Total

767

78090

 

 

 

 

 

Root MSE

2.93432

R-Square

0.9162

Dependent Mean

22.30720

Adj R-Sq

0.9154

Coeff Var

13.15413

 

 

 

 

Note:Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

Note:The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

 

X4 =

2.97E-8 * Intercept - 1.63E-8 * X1 + 0.5 * X2 - 0.5 * X3 + 342E-12 * X5

 


You can't have x4 and the combination of x1 x2 x3 x5 in the model at the same time. Why? Because x4 does not provide additional information, if you know x1 x2 x3 x5, then you know x4 exactly. So the solution is to remove x4 from the model.

--
Paige Miller
PGStats
Opal | Level 21

This feature of the data results from its artificial nature.

 

http://people.maths.ox.ac.uk/tsanas/Preprints/ENB2012.pdf

 

Multiple regression is not a promissing analysis tool for such data. You should try something simpler (conceptually) first, such as regression trees (proc hpsplit).

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 978 views
  • 0 likes
  • 3 in conversation