- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a question regarding polynomial regression interpretation. In the quadratic model below, all coefficients are significant. But, in the cubic model, the linear and squared coefficients are not significant but the cubic coefficient is significant. The cubic model is a slightly better fit than the quadratic, but not sure if the cubic model should be used given the significance issues. I would greatly appreciate any thoughts on this matter. Thanks!
FYI: day2 = day*day; day3=day*day*day.
*Quadratic model*;
ods graphics on;
proc reg data=no2019; by year;
model sps = day day2 / lackfit;
output out=b
student=sresid
stdp=stderr
p=yhat
r=yresid;
run;
ods graphics off;
The REG Procedure
Model: MODEL1
Dependent Variable: SPS
YEAR=17
Number of Observations Read | 450 |
Number of Observations Used | 450 |
Analysis of Variance | |||||
Source | DF | Sum of | Mean | F Value | Pr > F |
Model | 2 | 1375.32618 | 687.66309 | 192.34 | <.0001 |
Error | 447 | 1598.17313 | 3.57533 |
|
|
Lack of Fit | 2 | 17.24169 | 8.62084 | 2.43 | 0.0895 |
Pure Error | 445 | 1580.93144 | 3.55265 |
|
|
Corrected Total | 449 | 2973.49931 |
|
|
|
Root MSE | 1.89085 | R-Square | 0.4625 |
Dependent Mean | 4.85756 | Adj R-Sq | 0.4601 |
Coeff Var | 38.92605 |
|
|
Parameter Estimates | |||||
Variable | DF | Parameter | Standard | t Value | Pr > |t| |
Intercept | 1 | 7.56219 | 0.20706 | 36.52 | <.0001 |
DAY | 1 | -0.27141 | 0.04139 | -6.56 | <.0001 |
day2 | 1 | 0.00320 | 0.00149 | 2.15 | 0.0321 |
The REG Procedure
Model: MODEL1
Dependent Variable: SPS
YEAR=17
*cubic model*;
ods graphics on;
proc reg data=no2019; by year;
model sps = day day2 day3 / lackfit;
output out=b
student=sresid
stdp=stderr
p=yhat
r=yresid;
run;
ods graphics off;
The REG Procedure
Model: MODEL1
Dependent Variable: SPS
YEAR=17
Number of Observations Read | 450 |
Number of Observations Used | 450 |
Analysis of Variance | |||||
Source | DF | Sum of | Mean | F Value | Pr > F |
Model | 3 | 1391.51323 | 463.83774 | 130.77 | <.0001 |
Error | 446 | 1581.98608 | 3.54705 |
|
|
Lack of Fit | 1 | 1.05463 | 1.05463 | 0.30 | 0.5861 |
Pure Error | 445 | 1580.93144 | 3.55265 |
|
|
Corrected Total | 449 | 2973.49931 |
|
|
|
Root MSE | 1.88336 | R-Square | 0.4680 |
Dependent Mean | 4.85756 | Adj R-Sq | 0.4644 |
Coeff Var | 38.77181 |
|
|
Parameter Estimates | |||||
Variable | DF | Parameter | Standard | t Value | Pr > |t| |
Intercept | 1 | 7.22585 | 0.25946 | 27.85 | <.0001 |
DAY | 1 | -0.07735 | 0.09976 | -0.78 | 0.4385 |
day2 | 1 | -0.01591 | 0.00907 | -1.75 | 0.0800 |
day3 | 1 | 0.00047628 | 0.00022295 | 2.14 | 0.0332 |
The REG Procedure
Model: MODEL1
Dependent Variable: SPS
YEAR=17
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My response will be somewhat general as I would need a bit more information to give a more informative answer.
For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.
Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.
As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.
If you want to describe *only* to your data, you are fine to utilize the cubic model.
If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.
Regards,
Kyle
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What exactly is the purpose of the model? Are you looking to describe the fit or use the model for prediction?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The purpose is to fit a curve that can be used to estimate or determine when 50% of the maximum average was reached. Here is a graph using the quadratic model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My response will be somewhat general as I would need a bit more information to give a more informative answer.
For the purpose of providing an estimate of the 50% of the maximum response, fitting the highest order polynomial which is significant should provide you with the best estimate for your data.
Emphasis on _your_data_ , an important concept is that you want to fit the simplest model that accounts for most of the variance. Increasing the order of polynomial in the model can lead to the overfitting of your model and limit it's ability to generalize to other similar data.
As for the change in the sig. of the linear term in the model.. It aligns with the tighter fit to your data including the cubic term--which results in the linear coefficient lacking importance.
If you want to describe *only* to your data, you are fine to utilize the cubic model.
If you want to extrapolate to other data sets (making inferences, prediction, etc.), you'd be better off with the quadratic model.
Regards,
Kyle
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Kyle! That makes good sense.
Mark