BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mthorne
Obsidian | Level 7

I have a question regarding polynomial regression interpretation. In the quadratic model below, all coefficients are significant. But, in the cubic model, the linear and squared coefficients are not significant but the cubic coefficient is significant. The cubic model is a slightly better fit than the quadratic, but not sure if the cubic model should be used given the significance issues. I would greatly appreciate any thoughts on this matter.  Thanks!

 

FYI: day2 = day*day; day3=day*day*day.

 

*Quadratic model*;

ods graphics on;

proc reg data=no2019; by year;

model sps = day day2 / lackfit;

output out=b

student=sresid

stdp=stderr

p=yhat

r=yresid;

run;

ods graphics off;

 

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

Number of Observations Read

450

Number of Observations Used

450

 

Analysis of Variance

Source

DF

Sum of
Squares

Mean
Square

F Value

Pr > F

Model

2

1375.32618

687.66309

192.34

<.0001

Error

447

1598.17313

3.57533

 

 

Lack of Fit

2

17.24169

8.62084

2.43

0.0895

Pure Error

445

1580.93144

3.55265

 

 

Corrected Total

449

2973.49931

 

 

 

 

Root MSE

1.89085

R-Square

0.4625

Dependent Mean

4.85756

Adj R-Sq

0.4601

Coeff Var

38.92605

 

 

 

Parameter Estimates

Variable

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

Intercept

1

7.56219

0.20706

36.52

<.0001

DAY

1

-0.27141

0.04139

-6.56

<.0001

day2

1

0.00320

0.00149

2.15

0.0321

 

 

 

 

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

mthorne_0-1700680336347.png

 

 

 

 

*cubic model*;

ods graphics on;

proc reg data=no2019; by year;

model sps = day day2 day3 / lackfit;

output out=b

student=sresid

stdp=stderr

p=yhat

r=yresid;

run;

ods graphics off;

 

 

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

Number of Observations Read

450

Number of Observations Used

450

 

Analysis of Variance

Source

DF

Sum of
Squares

Mean
Square

F Value

Pr > F

Model

3

1391.51323

463.83774

130.77

<.0001

Error

446

1581.98608

3.54705

 

 

Lack of Fit

1

1.05463

1.05463

0.30

0.5861

Pure Error

445

1580.93144

3.55265

 

 

Corrected Total

449

2973.49931

 

 

 

 

Root MSE

1.88336

R-Square

0.4680

Dependent Mean

4.85756

Adj R-Sq

0.4644

Coeff Var

38.77181

 

 

 

Parameter Estimates

Variable

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

Intercept

1

7.22585

0.25946

27.85

<.0001

DAY

1

-0.07735

0.09976

-0.78

0.4385

day2

1

-0.01591

0.00907

-1.75

0.0800

day3

1

0.00047628

0.00022295

2.14

0.0332

 

 

 

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

mthorne_1-1700680336351.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Laser_Taco_
Fluorite | Level 6
Hi,

My response will be somewhat general as I would need a bit more information to give a more informative answer.

For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.

Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.

As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.

If you want to describe *only* to your data, you are fine to utilize the cubic model.

If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.

Regards,

Kyle

View solution in original post

4 REPLIES 4
Laser_Taco_
Fluorite | Level 6
Hello,

What exactly is the purpose of the model? Are you looking to describe the fit or use the model for prediction?

mthorne
Obsidian | Level 7

Hi,

 

The purpose is to fit a curve that can be used to estimate or determine when 50% of the maximum average was reached. Here is a graph using the quadratic model.

 

 
mthorne_0-1700682889382.png

 

 

mthorne_6-1700682827380.png

 

Laser_Taco_
Fluorite | Level 6
Hi,

My response will be somewhat general as I would need a bit more information to give a more informative answer.

For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.

Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.

As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.

If you want to describe *only* to your data, you are fine to utilize the cubic model.

If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.

Regards,

Kyle

mthorne
Obsidian | Level 7

Thank you, Kyle!  That makes good sense.

 

Mark

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 933 views
  • 1 like
  • 2 in conversation