%let slope1=-2; %let slope2=5; %let turner=30;
data indata;
do i=1 to 100;
if i<&turner. then y=5*ranuni(i)+i*&slope1.;
else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y;
run;quit;
I have observations on a broken line alike the one above.
How to do a fitting to find the parameters (slope1, slope2 and turner)?! Thanks,
Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm
I mean to fit stepwise line once for all, since turner[the turning point] is uncertain in general.
Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm
proc nlin data=indata;
parms slope1=0 slope2=0 x0=50;
if (x < x0) then
mean = x*slope1;
else mean = x*slope2-x0*slope2+x0*slope1;
model y = mean;
output out=indata_out predicted=yp;
run;
proc sgplot data=indata_out;
scatter x=x y=y;
scatter x=x y=yp;
run;quit;
Ye, it works. But is set the starting as slope1=-0.1 slope2=0.1, the outcomes are quite off.
Try other starting values
I take the example in the link and it works on my sample dataset. One quest, how to tell whether to take 2-step regression or just 1-step regression.
When turner is 95 or 98 or 99, somewhere just 1-step (one line) is preferred ... what criteria to take to tell 2-step regression is preferred?!
%let slope1=-2; %let slope2=5; %let turner=90;
data indata;
do i=1 to 100;
if i<&turner. then y=5*ranuni(i)+i*&slope1.;
else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y;
run;quit;
This is a question I have never had before, and I admit I don't know if there is a standard answer in this situation.
I assume you could compare the fits (via R-squared or RMSE or something similar) of the two models (one model with the break point and one model without). Something like the Extra Sum of Squares F-test makes sense to me. https://www.sfu.ca/~lockhart/richard/350/08_2/lectures/FTests/web.pdf
My toy sample's plot and Output are below.
I bet EST/STDERR on slope1 and slope2[F Test] might be close, since F-Test between 2-step model and 1-step model is not directly available.
BUT I am note sure since not of my expertise.
The NLIN Procedure
Estimation Summary (Not Converged)
Subiterations 98
Average Subiterations 3.5
R 0.015325
PPC(intcpt0) 0.019684
RPC .
Object 3.72E-16
Objective 8.39E13
Observations Read 66
Observations Used 66
Observations Missing 0
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Model 3 1.124E15 3.746E14 276.85 <.0001
Error 62 8.39E13 1.353E12
Corrected Total 65 1.208E15
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
intcpt0 1453782 526391 401542 2506022
slope1 -482416 41921.6 -566216 -398615
slope2 -121843 13352.5 -148534 -95151.8
x0 21.0000 1.6745 17.6526 24.3474
Approximate Correlation Matrix
intcpt0 slope1 slope2 x0
intcpt0 1.0000000 -0.8760376 -0.0000000 -0.4054904
slope1 -0.8760376 1.0000000 0.0000000 0.6943031
slope2 -0.0000000 0.0000000 1.0000000 0.5086293
x0 -0.4054904 0.6943031 0.5086293 1.0000000
You would have to code the equivalent of something like extra-sum-of-squares F-test yourself, based upon your PROC NLIN output. (Or do it with a calculator)
I set turner=95, and fetch ESS and follow "Hypothesis Test, 3.2" and get
ESS_F=((315653-318355)/2)/(318355/96)=-0.4, somewhere must be not right ??!
%let slope1=-2; %let slope2=5; %let turner=95;
data indata;
do ind=1 to 100;
if ind<&turner. then y=10*ranuni(ind)+ind*&slope1.;
else y=20*ranuni(ind)+ind*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=ind y=y;
run;quit;
%let inds=indata;
%let xvar=ind; %let yvar=y;
proc nlin data=&inds.;
parms intcpt0=0 slope1=0 slope2=0 x0=&st_x0;
if (ind < x0) then
mean = intcpt0+ind*slope1;
else mean = ind*slope2-x0*slope2+x0*slope1+intcpt0;
model &yvar. = mean;
output out=&inds._sel_out predicted=&yvar._p;
run;
proc reg data=&inds.;
model y=ind;run;quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y
Number of Observations Read 100
Number of Observations Used 100
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 315653 315653 8832.21 <.0001
Error 98 3502.40844 35.73886
Corrected Total 99 319156
Root MSE 5.97820 R-Square 0.9890
Dependent Mean -94.98238 Adj R-Sq 0.9889
Coeff Var -6.29401
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 3.30749 1.20466 2.75 0.0072
ind 1 -1.94633 0.02071 -93.98 <.0001
The NLIN Procedure
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Model 3 318355 106118 12731.3 <.0001
Error 96 800.2 8.3353
Corrected Total 99 319156
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
intcpt0 5.6447 0.6003 4.4530 6.8363
slope1 -2.0169 0.0110 -2.0387 -1.9951
slope2 4.5742 0.6901 3.2043 5.9441
x0 94.3985 0.3816 93.6411 95.1559
Approximate Correlation Matrix
intcpt0 slope1 slope2 x0
intcpt0 1.0000000 -0.8683135 -0.0000000 -0.1189745
slope1 -0.8683135 1.0000000 0.0000000 0.2046574
slope2 -0.0000000 0.0000000 1.0000000 0.8511409
x0 -0.1189745 0.2046574 0.8511409 1.0000000
This is called a piecewise linear model when each regression curve is linear. In general, these are called segmented models. See
Is there an index/list of topics that @Rick_SAS has written about? Or should we just assume that he has written about every possible topic 😁🙀👍 ?
No index, but you can search SAS blogs by using the SITE: keyword in an internet search engine. For example:
piecewise regression site:blogs.sas.com
will locate the two articles that I linked to.
Thanks! Actually, Google is almost as good as an index/list.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.