BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9
%let slope1=-2; %let slope2=5; %let turner=30;

data indata;
do i=1 to 100;
	if i<&turner. then y=5*ranuni(i)+i*&slope1.;
	else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y; 
run;quit;

SGPlot5.png

I have observations on a broken line alike the one above.

How to do a fitting to find the parameters (slope1, slope2 and turner)?! Thanks, 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm

--
Paige Miller

View solution in original post

18 REPLIES 18
hellohere
Pyrite | Level 9

I mean to fit stepwise line once for all, since turner[the turning point] is uncertain in general.

PaigeMiller
Diamond | Level 26

Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm

--
Paige Miller
hellohere
Pyrite | Level 9
proc nlin data=indata;
   parms slope1=0 slope2=0 x0=50;

   if (x < x0) then
        mean = x*slope1;
   else mean = x*slope2-x0*slope2+x0*slope1;
   model y = mean;

   output out=indata_out predicted=yp;
run;
proc sgplot data=indata_out;
scatter x=x y=y; 
scatter x=x y=yp; 
run;quit;

Ye, it works. But is set the starting as slope1=-0.1 slope2=0.1, the outcomes are quite off. 

PaigeMiller
Diamond | Level 26

Try other starting values

--
Paige Miller
hellohere
Pyrite | Level 9

I take the example in the link and it works on my sample dataset. One quest, how to tell whether to take 2-step regression or just 1-step regression.

When turner is 95 or 98 or 99, somewhere just 1-step (one line) is preferred ... what criteria to take to tell 2-step regression is preferred?!

 

%let slope1=-2; %let slope2=5; %let turner=90;

data indata;
do i=1 to 100;
	if i<&turner. then y=5*ranuni(i)+i*&slope1.;
	else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y; 
run;quit;
PaigeMiller
Diamond | Level 26

This is a question I have never had before, and I admit I don't know if there is a standard answer in this situation.

 

I assume you could compare the fits (via R-squared or RMSE or something similar) of the two models (one model with the break point and one model without). Something like the Extra Sum of Squares F-test makes sense to me. https://www.sfu.ca/~lockhart/richard/350/08_2/lectures/FTests/web.pdf

--
Paige Miller
hellohere
Pyrite | Level 9

My toy sample's plot and Output are below.

 

I bet EST/STDERR on slope1 and slope2[F Test] might be close, since F-Test between 2-step model and 1-step model is not directly available. 

BUT I am note sure since not of my expertise. 

 

SGPlot31.png


                                                             The NLIN Procedure

                                                    Estimation Summary (Not Converged)

                                                   Subiterations                      98
                                                   Average Subiterations             3.5
                                                   R                            0.015325
                                                   PPC(intcpt0)                 0.019684
                                                   RPC                                 .
                                                   Object                       3.72E-16
                                                   Objective                     8.39E13
                                                   Observations Read                  66
                                                   Observations Used                  66
                                                   Observations Missing                0


                                                                   Sum of        Mean               Approx
                                 Source                    DF     Squares      Square    F Value    Pr > F

                                 Model                      3    1.124E15    3.746E14     276.85    <.0001
                                 Error                     62     8.39E13    1.353E12
                                 Corrected Total           65    1.208E15


                                                                      Approx
                                        Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

                                        intcpt0         1453782       526391      401542     2506022
                                        slope1          -482416      41921.6     -566216     -398615
                                        slope2          -121843      13352.5     -148534    -95151.8
                                        x0              21.0000       1.6745     17.6526     24.3474


                                                      Approximate Correlation Matrix
                                                  intcpt0          slope1          slope2              x0

                                  intcpt0       1.0000000      -0.8760376      -0.0000000      -0.4054904
                                  slope1       -0.8760376       1.0000000       0.0000000       0.6943031
                                  slope2       -0.0000000       0.0000000       1.0000000       0.5086293
                                  x0           -0.4054904       0.6943031       0.5086293       1.0000000


PaigeMiller
Diamond | Level 26

You would have to code the equivalent of something like extra-sum-of-squares F-test yourself, based upon your PROC NLIN output. (Or do it with a calculator)

--
Paige Miller
hellohere
Pyrite | Level 9

I set turner=95, and fetch ESS and follow "Hypothesis Test, 3.2" and get

ESS_F=((315653-318355)/2)/(318355/96)=-0.4, somewhere must be not right ??!

 


%let slope1=-2; %let slope2=5; %let turner=95;

data indata;
do ind=1 to 100;
	if ind<&turner. then y=10*ranuni(ind)+ind*&slope1.;
	else y=20*ranuni(ind)+ind*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=ind y=y; 
run;quit;

%let inds=indata;
%let xvar=ind; %let yvar=y;

	proc nlin data=&inds.;
	   parms intcpt0=0 slope1=0 slope2=0 x0=&st_x0;
	   if (ind < x0) then
	        mean = intcpt0+ind*slope1;
	   else mean = ind*slope2-x0*slope2+x0*slope1+intcpt0;
	   model &yvar. = mean;

	   output out=&inds._sel_out predicted=&yvar._p;
	run;

	proc reg data=&inds.;
	model y=ind;run;quit;

 


                                                             The REG Procedure
                                                               Model: MODEL1
                                                           Dependent Variable: y

                                                  Number of Observations Read         100
                                                  Number of Observations Used         100


                                                            Analysis of Variance

                                                                   Sum of           Mean
                               Source                   DF        Squares         Square    F Value    Pr > F

                               Model                     1         315653         315653    8832.21    <.0001
                               Error                    98     3502.40844       35.73886
                               Corrected Total          99         319156


                                            Root MSE              5.97820    R-Square     0.9890
                                            Dependent Mean      -94.98238    Adj R-Sq     0.9889
                                            Coeff Var            -6.29401


                                                            Parameter Estimates

                                                         Parameter       Standard
                                    Variable     DF       Estimate          Error    t Value    Pr > |t|

                                    Intercept     1        3.30749        1.20466       2.75      0.0072
                                    ind           1       -1.94633        0.02071     -93.98      <.0001



                                                             The NLIN Procedure

                                                                   Sum of        Mean               Approx
                                 Source                    DF     Squares      Square    F Value    Pr > F

                                 Model                      3      318355      106118    12731.3    <.0001
                                 Error                     96       800.2      8.3353
                                 Corrected Total           99      319156


                                                                      Approx
                                        Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

                                        intcpt0          5.6447       0.6003      4.4530      6.8363
                                        slope1          -2.0169       0.0110     -2.0387     -1.9951
                                        slope2           4.5742       0.6901      3.2043      5.9441
                                        x0              94.3985       0.3816     93.6411     95.1559


                                                      Approximate Correlation Matrix
                                                  intcpt0          slope1          slope2              x0

                                  intcpt0       1.0000000      -0.8683135      -0.0000000      -0.1189745
                                  slope1       -0.8683135       1.0000000       0.0000000       0.2046574
                                  slope2       -0.0000000       0.0000000       1.0000000       0.8511409
                                  x0           -0.1189745       0.2046574       0.8511409       1.0000000
Ksharp
Super User
@Rick_SAS wrote a blog about this topic before.
Rick_SAS
SAS Super FREQ

This is called a piecewise linear model when each regression curve is linear. In general, these are called segmented models. See

PaigeMiller
Diamond | Level 26

Is there an index/list of topics that @Rick_SAS has written about? Or should we just assume that he has written about every possible topic 😁🙀👍 ?

--
Paige Miller
Rick_SAS
SAS Super FREQ

No index, but you can search SAS blogs by using the SITE: keyword in an internet search engine. For example:

piecewise regression site:blogs.sas.com

will locate the two articles that I linked to.

 

PaigeMiller
Diamond | Level 26

Thanks! Actually, Google is almost as good as an index/list.

--
Paige Miller

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 2783 views
  • 7 likes
  • 4 in conversation