%let slope1=-2; %let slope2=5; %let turner=30;
data indata;
do i=1 to 100;
if i<&turner. then y=5*ranuni(i)+i*&slope1.;
else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y;
run;quit;
I have observations on a broken line alike the one above.
How to do a fitting to find the parameters (slope1, slope2 and turner)?! Thanks,
Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm
I mean to fit stepwise line once for all, since turner[the turning point] is uncertain in general.
Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm
proc nlin data=indata;
parms slope1=0 slope2=0 x0=50;
if (x < x0) then
mean = x*slope1;
else mean = x*slope2-x0*slope2+x0*slope1;
model y = mean;
output out=indata_out predicted=yp;
run;
proc sgplot data=indata_out;
scatter x=x y=y;
scatter x=x y=yp;
run;quit;
Ye, it works. But is set the starting as slope1=-0.1 slope2=0.1, the outcomes are quite off.
Try other starting values
I take the example in the link and it works on my sample dataset. One quest, how to tell whether to take 2-step regression or just 1-step regression.
When turner is 95 or 98 or 99, somewhere just 1-step (one line) is preferred ... what criteria to take to tell 2-step regression is preferred?!
%let slope1=-2; %let slope2=5; %let turner=90;
data indata;
do i=1 to 100;
if i<&turner. then y=5*ranuni(i)+i*&slope1.;
else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y;
run;quit;
This is a question I have never had before, and I admit I don't know if there is a standard answer in this situation.
I assume you could compare the fits (via R-squared or RMSE or something similar) of the two models (one model with the break point and one model without). Something like the Extra Sum of Squares F-test makes sense to me. https://www.sfu.ca/~lockhart/richard/350/08_2/lectures/FTests/web.pdf
My toy sample's plot and Output are below.
I bet EST/STDERR on slope1 and slope2[F Test] might be close, since F-Test between 2-step model and 1-step model is not directly available.
BUT I am note sure since not of my expertise.
The NLIN Procedure
Estimation Summary (Not Converged)
Subiterations 98
Average Subiterations 3.5
R 0.015325
PPC(intcpt0) 0.019684
RPC .
Object 3.72E-16
Objective 8.39E13
Observations Read 66
Observations Used 66
Observations Missing 0
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Model 3 1.124E15 3.746E14 276.85 <.0001
Error 62 8.39E13 1.353E12
Corrected Total 65 1.208E15
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
intcpt0 1453782 526391 401542 2506022
slope1 -482416 41921.6 -566216 -398615
slope2 -121843 13352.5 -148534 -95151.8
x0 21.0000 1.6745 17.6526 24.3474
Approximate Correlation Matrix
intcpt0 slope1 slope2 x0
intcpt0 1.0000000 -0.8760376 -0.0000000 -0.4054904
slope1 -0.8760376 1.0000000 0.0000000 0.6943031
slope2 -0.0000000 0.0000000 1.0000000 0.5086293
x0 -0.4054904 0.6943031 0.5086293 1.0000000
You would have to code the equivalent of something like extra-sum-of-squares F-test yourself, based upon your PROC NLIN output. (Or do it with a calculator)
I set turner=95, and fetch ESS and follow "Hypothesis Test, 3.2" and get
ESS_F=((315653-318355)/2)/(318355/96)=-0.4, somewhere must be not right ??!
%let slope1=-2; %let slope2=5; %let turner=95;
data indata;
do ind=1 to 100;
if ind<&turner. then y=10*ranuni(ind)+ind*&slope1.;
else y=20*ranuni(ind)+ind*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
output;
end;
run;quit;
ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=ind y=y;
run;quit;
%let inds=indata;
%let xvar=ind; %let yvar=y;
proc nlin data=&inds.;
parms intcpt0=0 slope1=0 slope2=0 x0=&st_x0;
if (ind < x0) then
mean = intcpt0+ind*slope1;
else mean = ind*slope2-x0*slope2+x0*slope1+intcpt0;
model &yvar. = mean;
output out=&inds._sel_out predicted=&yvar._p;
run;
proc reg data=&inds.;
model y=ind;run;quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y
Number of Observations Read 100
Number of Observations Used 100
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 315653 315653 8832.21 <.0001
Error 98 3502.40844 35.73886
Corrected Total 99 319156
Root MSE 5.97820 R-Square 0.9890
Dependent Mean -94.98238 Adj R-Sq 0.9889
Coeff Var -6.29401
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 3.30749 1.20466 2.75 0.0072
ind 1 -1.94633 0.02071 -93.98 <.0001
The NLIN Procedure
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Model 3 318355 106118 12731.3 <.0001
Error 96 800.2 8.3353
Corrected Total 99 319156
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
intcpt0 5.6447 0.6003 4.4530 6.8363
slope1 -2.0169 0.0110 -2.0387 -1.9951
slope2 4.5742 0.6901 3.2043 5.9441
x0 94.3985 0.3816 93.6411 95.1559
Approximate Correlation Matrix
intcpt0 slope1 slope2 x0
intcpt0 1.0000000 -0.8683135 -0.0000000 -0.1189745
slope1 -0.8683135 1.0000000 0.0000000 0.2046574
slope2 -0.0000000 0.0000000 1.0000000 0.8511409
x0 -0.1189745 0.2046574 0.8511409 1.0000000
This is called a piecewise linear model when each regression curve is linear. In general, these are called segmented models. See
Is there an index/list of topics that @Rick_SAS has written about? Or should we just assume that he has written about every possible topic 😁🙀👍 ?
No index, but you can search SAS blogs by using the SITE: keyword in an internet search engine. For example:
piecewise regression site:blogs.sas.com
will locate the two articles that I linked to.
Thanks! Actually, Google is almost as good as an index/list.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.