BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
farmeister
Calcite | Level 5

Could anybody kindly offer some advice regarding the following issue I'm having with the Quantreg procedure?

Basically, the estimated response (using the PREDICTED keyword in the OUTPUT statement) is giving me one set of estimates.  But manually calculating the estimated response using the regression parameters given by the same Quantreg procedure gives a different set of estimates.  I would have expected the two to be exactly the same.  They are close, but still significantly different. Is there a reason they should be different? 

Long story below...

I have a large collection of birthweight data and am trying to establish centile growth curves for these data using Quantreg (i.e., estimate the weight of the infant at different gestational ages).  I am fitting the centile curves to a 4th order polynomial with gestational age (gacorr) being the independent variable, and birthweight (bweight) the dependent variable.


Here is a sample of the raw data:


Obsbweightbsexlabouronsetgacorr
1160Mspontaneous17.3593
2230Mspontaneous18.3720
3340Mspontaneous19.2857
4270Mspontaneous19.2857
5360Mspontaneous19.5714


Here is the quantreg procedure that I'm using to create a 10th percentile growth curve:

proc quantreg data = AllMaleSP

      algorithm = interior (kappa = 0.9) ci = resampling plots (maxpoints = none);

      where gacorr >=26 and gacorr <=42;

      model bweight = gacorr gacorr*gacorr gacorr*gacorr*gacorr gacorr*gacorr*gacorr*gacorr

            / quantile0.1;;

      output out=AllMaleSPPred predicted = pred;

      run;

Here are the regression parameters estimated by quantreg for the 0.1 centile:


Intercept                                        -67485.3

gacorr                                             8458.845

gacorr*gcorr                                   -397.300

gacorr*gacorr*gacorr                         8.3216

gacorr*gacorr*gacorr*gacorr               -0.0644


And here is a sample of the output of quantreg, including the estimated response (pred😞


ObsbweightbsexlabouronsetgacorrpredQUANTILE
1987Mspontaneous26723.1470.1
2746Mspontaneous26723.1470.1
3995Mspontaneous26723.1470.1
4840Mspontaneous26723.1470.1
5760Mspontaneous26723.1470.1


So the PREDICTED output of the quantreg procedure estimates a 10th centile birthweight of 723g at 26 weeks.

But if I use the actual parameter estimates, and plug 26 weeks into the regression formula -67485.3 + 8458.845*gacorr -397.300*gacorr^2 +8.3216*gacorr^3 -0.0644*gacorr^4, I get 701g

I’ve checked for other gestations, as well as other subsets of the data and other centiles, and get similar discrepancies. I’m reluctant to use the regression formula without knowing the reason it gives a different estimate to the PREDICTED output.

Any ideas?  I would be greatly appreciate any guidance.  Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The parameter estimates that appear in the ODS table are formatted, so you are seeing rounded values. So, for example, the coefficient of the quartic term could be anywhere between -0.06435 and -0.064449 and be formatted as -0.0644.  Because the data are not centered, small changes in the coefficient of 26**4 will make a big difference in the predicted values:

data a;

gacorr = 26;

a4 = -0.06435;

pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;

output;

a4 = -0.064449;

pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;

output;

run;

proc print;

run;

To get non-formatted estimates, use ODS OUTPUT to create a SAS data set from the ParameterEstimates table.  If you use the values in the data set, the predicted values should agree with the results of the procedure.

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

The parameter estimates that appear in the ODS table are formatted, so you are seeing rounded values. So, for example, the coefficient of the quartic term could be anywhere between -0.06435 and -0.064449 and be formatted as -0.0644.  Because the data are not centered, small changes in the coefficient of 26**4 will make a big difference in the predicted values:

data a;

gacorr = 26;

a4 = -0.06435;

pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;

output;

a4 = -0.064449;

pred = -67485.3 + 8458.845*gacorr -397.300*gacorr**2 +8.3216*gacorr**3 + a4*gacorr**4;

output;

run;

proc print;

run;

To get non-formatted estimates, use ODS OUTPUT to create a SAS data set from the ParameterEstimates table.  If you use the values in the data set, the predicted values should agree with the results of the procedure.

farmeister
Calcite | Level 5

Thank you!

I've taken the parameter estimates directly from OUTEST and they match almost perfectly now. 

SteveDenham
Jade | Level 19

Glad you were able to do so, but I have a different question.  Why fit a quartic polynomial?  Unless you have a good biological reason, wouldn't some other model, perhaps using the EFFECT statement to fit a spline, have resulted in superior performance?  I've been digging through my mathematical biology references and I don't see much evidence for any biological processes that give rise to a fourth order response.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1561 views
  • 1 like
  • 3 in conversation