Could anybody kindly offer some advice regarding the following issue I'm having with the Quantreg procedure?
Basically, the estimated response (using the PREDICTED keyword in the OUTPUT statement) is giving me one set of estimates. But manually calculating the estimated response using the regression parameters given by the same Quantreg procedure gives a different set of estimates. I would have expected the two to be exactly the same. They are close, but still significantly different. Is there a reason they should be different?
Long story below...
I have a large collection of birthweight data and am trying to establish centile growth curves for these data using Quantreg (i.e., estimate the weight of the infant at different gestational ages). I am fitting the centile curves to a 4th order polynomial with gestational age (gacorr) being the independent variable, and birthweight (bweight) the dependent variable.
Here is a sample of the raw data:
Obs | bweight | bsex | labouronset | gacorr |
---|---|---|---|---|
1 | 160 | M | spontaneous | 17.3593 |
2 | 230 | M | spontaneous | 18.3720 |
3 | 340 | M | spontaneous | 19.2857 |
4 | 270 | M | spontaneous | 19.2857 |
5 | 360 | M | spontaneous | 19.5714 |
Here is the quantreg procedure that I'm using to create a 10th percentile growth curve:
proc quantreg data = AllMaleSP
algorithm = interior (kappa = 0.9) ci = resampling plots (maxpoints = none);
where gacorr >=26 and gacorr <=42;
model bweight = gacorr gacorr*gacorr gacorr*gacorr*gacorr gacorr*gacorr*gacorr*gacorr
/ quantile = 0.1;;
output out=AllMaleSPPred predicted = pred;
run;
Here are the regression parameters estimated by quantreg for the 0.1 centile:
Intercept -67485.3
gacorr 8458.845
gacorr*gcorr -397.300
gacorr*gacorr*gacorr 8.3216
gacorr*gacorr*gacorr*gacorr -0.0644
And here is a sample of the output of quantreg, including the estimated response (pred😞
Obs | bweight | bsex | labouronset | gacorr | pred | QUANTILE |
---|---|---|---|---|---|---|
1 | 987 | M | spontaneous | 26 | 723.147 | 0.1 |
2 | 746 | M | spontaneous | 26 | 723.147 | 0.1 |
3 | 995 | M | spontaneous | 26 | 723.147 | 0.1 |
4 | 840 | M | spontaneous | 26 | 723.147 | 0.1 |
5 | 760 | M | spontaneous | 26 | 723.147 | 0.1 |
So the PREDICTED output of the quantreg procedure estimates a 10th centile birthweight of 723g at 26 weeks.
But if I use the actual parameter estimates, and plug 26 weeks into the regression formula -67485.3 + 8458.845*gacorr -397.300*gacorr^2 +8.3216*gacorr^3 -0.0644*gacorr^4, I get 701g.
I’ve checked for other gestations, as well as other subsets of the data and other centiles, and get similar discrepancies. I’m reluctant to use the regression formula without knowing the reason it gives a different estimate to the PREDICTED output.
Any ideas? I would be greatly appreciate any guidance. Thank you.
The parameter estimates that appear in the ODS table are formatted, so you are seeing rounded values. So, for example, the coefficient of the quartic term could be anywhere between -0.06435 and -0.064449 and be formatted as -0.0644. Because the data are not centered, small changes in the coefficient of 26**4 will make a big difference in the predicted values:
data a;
gacorr = 26;
a4 = -0.06435;
pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;
output;
a4 = -0.064449;
pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;
output;
run;
proc print;
run;
To get non-formatted estimates, use ODS OUTPUT to create a SAS data set from the ParameterEstimates table. If you use the values in the data set, the predicted values should agree with the results of the procedure.
The parameter estimates that appear in the ODS table are formatted, so you are seeing rounded values. So, for example, the coefficient of the quartic term could be anywhere between -0.06435 and -0.064449 and be formatted as -0.0644. Because the data are not centered, small changes in the coefficient of 26**4 will make a big difference in the predicted values:
data a;
gacorr = 26;
a4 = -0.06435;
pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;
output;
a4 = -0.064449;
pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;
output;
run;
proc print;
run;
To get non-formatted estimates, use ODS OUTPUT to create a SAS data set from the ParameterEstimates table. If you use the values in the data set, the predicted values should agree with the results of the procedure.
Thank you!
I've taken the parameter estimates directly from OUTEST and they match almost perfectly now.
Glad you were able to do so, but I have a different question. Why fit a quartic polynomial? Unless you have a good biological reason, wouldn't some other model, perhaps using the EFFECT statement to fit a spline, have resulted in superior performance? I've been digging through my mathematical biology references and I don't see much evidence for any biological processes that give rise to a fourth order response.
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.