Statistical Procedures

mthorne · Posted 11-22-2023 02:19 PM

I have a question regarding polynomial regression interpretation. In the quadratic model below, all coefficients are significant. But, in the cubic model, the linear and squared coefficients are not significant but the cubic coefficient is significant. The cubic model is a slightly better fit than the quadratic, but not sure if the cubic model should be used given the significance issues. I would greatly appreciate any thoughts on this matter. Thanks!

FYI: day2 = day*day; day3=day*day*day.

*Quadratic model*;

ods graphics on;

proc reg data=no2019; by year;

model sps = day day2 / lackfit;

output out=b

student=sresid

stdp=stderr

p=yhat

r=yresid;

run;

ods graphics off;

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

Number of Observations Read	450
Number of Observations Used	450

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	1375.32618	687.66309	192.34	<.0001
Error	447	1598.17313	3.57533
Lack of Fit	2	17.24169	8.62084	2.43	0.0895
Pure Error	445	1580.93144	3.55265
Corrected Total	449	2973.49931

Root MSE	1.89085	R-Square	0.4625
Dependent Mean	4.85756	Adj R-Sq	0.4601
Coeff Var	38.92605

Parameter Estimates
Variable	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	1	7.56219	0.20706	36.52	<.0001
DAY	1	-0.27141	0.04139	-6.56	<.0001
day2	1	0.00320	0.00149	2.15	0.0321

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

*cubic model*;

ods graphics on;

proc reg data=no2019; by year;

model sps = day day2 day3 / lackfit;

output out=b

student=sresid

stdp=stderr

p=yhat

r=yresid;

run;

ods graphics off;

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

Number of Observations Read	450
Number of Observations Used	450

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	3	1391.51323	463.83774	130.77	<.0001
Error	446	1581.98608	3.54705
Lack of Fit	1	1.05463	1.05463	0.30	0.5861
Pure Error	445	1580.93144	3.55265
Corrected Total	449	2973.49931

Root MSE	1.88336	R-Square	0.4680
Dependent Mean	4.85756	Adj R-Sq	0.4644
Coeff Var	38.77181

Parameter Estimates
Variable	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	1	7.22585	0.25946	27.85	<.0001
DAY	1	-0.07735	0.09976	-0.78	0.4385
day2	1	-0.01591	0.00907	-1.75	0.0800
day3	1	0.00047628	0.00022295	2.14	0.0332

The REG Procedure

Model: MODEL1

Dependent Variable: SPS

YEAR=17

Laser_Taco_ · Posted 11-22-2023 03:41 PM

Hi,

My response will be somewhat general as I would need a bit more information to give a more informative answer.

For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.

Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.

As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.

If you want to describe *only* to your data, you are fine to utilize the cubic model.

If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.

Regards,

Kyle

View solution in original post

Laser_Taco_ · Posted 11-22-2023 02:45 PM

Hello,

What exactly is the purpose of the model? Are you looking to describe the fit or use the model for prediction?

mthorne · Posted 11-22-2023 02:54 PM

Hi,

The purpose is to fit a curve that can be used to estimate or determine when 50% of the maximum average was reached. Here is a graph using the quadratic model.

Laser_Taco_ · Posted 11-22-2023 03:41 PM

Hi,

My response will be somewhat general as I would need a bit more information to give a more informative answer.

For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.

Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.

As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.

If you want to describe *only* to your data, you are fine to utilize the cubic model.

If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.

Regards,

Kyle

mthorne · Posted 11-22-2023 04:11 PM

Thank you, Kyle! That makes good sense.

Mark

Statistical Procedures

Interpreting cubic vs. quadratic model fit and p values.

Re: Interpreting cubic vs. quadratic model fit and p values.

Re: Interpreting cubic vs. quadratic model fit and p values.

Re: Interpreting cubic vs. quadratic model fit and p values.

Re: Interpreting cubic vs. quadratic model fit and p values.

Re: Interpreting cubic vs. quadratic model fit and p values.

Model Interpretability for Models with Uninterpretable Features

Fitting a Capital Asset Pricing Model

Fitting Regression Models to Formulation Data

Fitting Tweedie’s Compound Poisson-Gamma Mixture Model by Using PROC H...

Odds Ratio Interpretation

Follow Us

What is...

Statistical Procedures

Join us for our biggest event of the year!

Follow Us

What is...