BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
acordes
Rhodochrosite | Level 12

The context is to determine the impact of the car's optional equipment on the residual value of the car. I.e. how much of 3000 € of optional equipment will be retained in the car's sale price as a used car. 

The optional equipment is only one of many predictors like mileage, age, model, fuel type, etc.

I wrote code in #IML to isolate the impact of each of the numerical predictors in order to make them visible. I applied the score code of the gradient boosting I had used to train the model. 

The final result mimics a partial dependence plot but with more control over the crossings. 

 

Here comes my challenge. My boss wants to simplify the model: instead of the step function typical for tree-based algorithms he wants a flat line until X value of optional equipment followed by a linear decline having an intersection with the maximum value loss at Y value of optional equipment. 

I tried to explain this by drawing the green line into the plot. 

 

My starting point would be to model the residual value with all predictos except the optional equipment, derive the residual y model it with Proc Nlin in the described manner. 

 

the original model's "partial dependence plot"

clip_image002.gif 

What I would like to model is the green line. 

So I need the model to define the y-value of the plateau, the x value where the plateau ends and the slope of the curve from then on. 

@Rick_SAS my starting point should be 

https://blogs.sas.com/content/iml/2020/12/14/segmented-regression-sas.html , correct?

piecewise regression.png

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Correct, if you need to estimate the point at which the value is no longer constant. 

Personally, I would discuss with your boss the possibility of using an exponentially decreasing component for the second component, since the linear model will eventually predict a negative value, which does not make sense. 

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

Correct, if you need to estimate the point at which the value is no longer constant. 

Personally, I would discuss with your boss the possibility of using an exponentially decreasing component for the second component, since the linear model will eventually predict a negative value, which does not make sense. 

acordes
Rhodochrosite | Level 12

Nice.

I've opted for a polynomial model because the condition of equal scopes for an exponential model cannot be met, at least if the plateau is left-sided and the exponential a decay function, correct?

@Rick_SAS , can I restrict the breakpoint to comply with a certain minimum, like 2000 euros in my case?

 

And for the paramter estimates I had to resort to an image search on google and copy the parameters from a curve that had the 'targeted' shape... I had tried proc glimmix with poly effect but the result was nearly a model of degree 1. 

 

 

nice.png

 

data WORK.want;
  infile datalines dsd truncover;
  input x:COMMA15.1 y:PERCENT12.1;
datalines4;
0.0,0.0%
307.7,0.1%
615.4,0.5%
923.1,0.3%
"1,230.8",(      0.3%)
"1,538.5",0.2%
"1,846.2",(      1.3%)
"2,154.0",(      3.5%)
"2,461.7",(      5.8%)
"2,769.4",(      6.0%)
"3,077.1",(      6.0%)
"3,384.8",(      6.0%)
"3,692.5",(      6.7%)
"4,000.2",(      8.9%)
"4,307.9",(      8.9%)
"4,615.6",(      8.9%)
"4,923.3",(      8.9%)
"5,231.0",(     10.9%)
"5,538.7",(     10.9%)
"5,846.4",(     10.7%)
;;;;
run;

title 'Segmented Model with Plateau';
proc nlin data=want plots=fit noitprint;
parameters a=-1 b=1 c=1 d=0;
   
x0 = ((b**2/9)/a**2 - c/3*a)**0.5 - b/3*a;
if (x > x0) then
mean = d + a*x**3  + b*x**2 + c*x;    /* polynomial model for x < x0 */
else mean = d + a*x0**3  + b*x0**2 + c*x0;  /* constant model for x >= x0 */
model y = mean;

estimate 'plateau'    d + a*x0**3  + b*x0**2 + c*x0;
estimate 'breakpoint' ((b**2/9)/a**2 - c/3*a)**0.5 - b/3*a;
output out=NLinOut predicted=Pred L95M=Lower U95M=Upper;
ods select ParameterEstimates AdditionalEstimates FitPlot;
run;

 

Rick_SAS
SAS Super FREQ

I don't understand how you solved for the parameter x0. In my blog post, I used the requirement that f(x0)=g(x0) (and similarly for the derivatives) to eliminate x0 from the parameter list. But in your situation, you have represented the equation on the left side of x0 in terms of x0 and the cubic coefficients, so you can't eliminate any of the parameters. 

 

You can use the BOUNDS statement to provide a lower bound for x0:


proc nlin data=want plots=fit noitprint;
parameters a=-1 b=1 c=1 d=0 x0=2000;
bounds x0 >= 2000;

if (x > x0) then
mean = d + a*x**3  + b*x**2 + c*x;          /* polynomial model for x < x0 */
else mean = d + a*x0**3  + b*x0**2 + c*x0;  /* constant model for x >= x0 */
model y = mean;

estimate 'plateau'    d + a*x0**3  + b*x0**2 + c*x0;
estimate 'breakpoint' x0;
output out=NLinOut predicted=Pred L95M=Lower U95M=Upper;
ods select ParameterEstimates AdditionalEstimates FitPlot;
run;
acordes
Rhodochrosite | Level 12

I followed on this sas documentation

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_nlin_examples01.htm 

I took the first derivative on the polynomial of degree=3 and I solved the quadratic polynomial for x0. I plugged in one of the possible solutions. 

 

 

Rick_SAS
SAS Super FREQ

That is not correct. If you want continuity, you want the first function evaluated at x0 to equal the second function evaluated at x0. But you have already ensured that by defining the first segment to be the cubic polynomial evaluated at x0.

 

In the doc, the derivative is used to set the derivatives of the two segments equal to each other. This is a smoothness constraint. The first segment (the constant function) has slope 0. If you want the second derivative to have slope 0, you would need to take the derivative at x0 and set it equal to 0. Then the RHS model will have slope 0 at x0. From your picture, the RHS segment does not have zero slope at x0.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 959 views
  • 3 likes
  • 2 in conversation