BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9
%let slope1=-2; %let slope2=5; %let turner=30;

data indata;
do i=1 to 100;
	if i<&turner. then y=5*ranuni(i)+i*&slope1.;
	else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y; 
run;quit;

SGPlot5.png

I have observations on a broken line alike the one above.

How to do a fitting to find the parameters (slope1, slope2 and turner)?! Thanks, 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm

--
Paige Miller

View solution in original post

18 REPLIES 18
hellohere
Pyrite | Level 9

I mean to fit stepwise line once for all, since turner[the turning point] is uncertain in general.

PaigeMiller
Diamond | Level 26

Here's an example with a curved portion and a linear portion; just replace the curved portion with a straight line

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_nlin_examples01.htm

--
Paige Miller
hellohere
Pyrite | Level 9
proc nlin data=indata;
   parms slope1=0 slope2=0 x0=50;

   if (x < x0) then
        mean = x*slope1;
   else mean = x*slope2-x0*slope2+x0*slope1;
   model y = mean;

   output out=indata_out predicted=yp;
run;
proc sgplot data=indata_out;
scatter x=x y=y; 
scatter x=x y=yp; 
run;quit;

Ye, it works. But is set the starting as slope1=-0.1 slope2=0.1, the outcomes are quite off. 

PaigeMiller
Diamond | Level 26

Try other starting values

--
Paige Miller
hellohere
Pyrite | Level 9

I take the example in the link and it works on my sample dataset. One quest, how to tell whether to take 2-step regression or just 1-step regression.

When turner is 95 or 98 or 99, somewhere just 1-step (one line) is preferred ... what criteria to take to tell 2-step regression is preferred?!

 

%let slope1=-2; %let slope2=5; %let turner=90;

data indata;
do i=1 to 100;
	if i<&turner. then y=5*ranuni(i)+i*&slope1.;
	else y=10*ranuni(i)+i*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=i y=y; 
run;quit;
PaigeMiller
Diamond | Level 26

This is a question I have never had before, and I admit I don't know if there is a standard answer in this situation.

 

I assume you could compare the fits (via R-squared or RMSE or something similar) of the two models (one model with the break point and one model without). Something like the Extra Sum of Squares F-test makes sense to me. https://www.sfu.ca/~lockhart/richard/350/08_2/lectures/FTests/web.pdf

--
Paige Miller
hellohere
Pyrite | Level 9

My toy sample's plot and Output are below.

 

I bet EST/STDERR on slope1 and slope2[F Test] might be close, since F-Test between 2-step model and 1-step model is not directly available. 

BUT I am note sure since not of my expertise. 

 

SGPlot31.png


                                                             The NLIN Procedure

                                                    Estimation Summary (Not Converged)

                                                   Subiterations                      98
                                                   Average Subiterations             3.5
                                                   R                            0.015325
                                                   PPC(intcpt0)                 0.019684
                                                   RPC                                 .
                                                   Object                       3.72E-16
                                                   Objective                     8.39E13
                                                   Observations Read                  66
                                                   Observations Used                  66
                                                   Observations Missing                0


                                                                   Sum of        Mean               Approx
                                 Source                    DF     Squares      Square    F Value    Pr > F

                                 Model                      3    1.124E15    3.746E14     276.85    <.0001
                                 Error                     62     8.39E13    1.353E12
                                 Corrected Total           65    1.208E15


                                                                      Approx
                                        Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

                                        intcpt0         1453782       526391      401542     2506022
                                        slope1          -482416      41921.6     -566216     -398615
                                        slope2          -121843      13352.5     -148534    -95151.8
                                        x0              21.0000       1.6745     17.6526     24.3474


                                                      Approximate Correlation Matrix
                                                  intcpt0          slope1          slope2              x0

                                  intcpt0       1.0000000      -0.8760376      -0.0000000      -0.4054904
                                  slope1       -0.8760376       1.0000000       0.0000000       0.6943031
                                  slope2       -0.0000000       0.0000000       1.0000000       0.5086293
                                  x0           -0.4054904       0.6943031       0.5086293       1.0000000


PaigeMiller
Diamond | Level 26

You would have to code the equivalent of something like extra-sum-of-squares F-test yourself, based upon your PROC NLIN output. (Or do it with a calculator)

--
Paige Miller
hellohere
Pyrite | Level 9

I set turner=95, and fetch ESS and follow "Hypothesis Test, 3.2" and get

ESS_F=((315653-318355)/2)/(318355/96)=-0.4, somewhere must be not right ??!

 


%let slope1=-2; %let slope2=5; %let turner=95;

data indata;
do ind=1 to 100;
	if ind<&turner. then y=10*ranuni(ind)+ind*&slope1.;
	else y=20*ranuni(ind)+ind*&slope2.-&turner.*&slope2.+&turner.*&slope1.;
	output;
end;
run;quit;

ods listing gpath="%sysfunc(getoption(work))";
proc sgplot data=indata;
scatter x=ind y=y; 
run;quit;

%let inds=indata;
%let xvar=ind; %let yvar=y;

	proc nlin data=&inds.;
	   parms intcpt0=0 slope1=0 slope2=0 x0=&st_x0;
	   if (ind < x0) then
	        mean = intcpt0+ind*slope1;
	   else mean = ind*slope2-x0*slope2+x0*slope1+intcpt0;
	   model &yvar. = mean;

	   output out=&inds._sel_out predicted=&yvar._p;
	run;

	proc reg data=&inds.;
	model y=ind;run;quit;

 


                                                             The REG Procedure
                                                               Model: MODEL1
                                                           Dependent Variable: y

                                                  Number of Observations Read         100
                                                  Number of Observations Used         100


                                                            Analysis of Variance

                                                                   Sum of           Mean
                               Source                   DF        Squares         Square    F Value    Pr > F

                               Model                     1         315653         315653    8832.21    <.0001
                               Error                    98     3502.40844       35.73886
                               Corrected Total          99         319156


                                            Root MSE              5.97820    R-Square     0.9890
                                            Dependent Mean      -94.98238    Adj R-Sq     0.9889
                                            Coeff Var            -6.29401


                                                            Parameter Estimates

                                                         Parameter       Standard
                                    Variable     DF       Estimate          Error    t Value    Pr > |t|

                                    Intercept     1        3.30749        1.20466       2.75      0.0072
                                    ind           1       -1.94633        0.02071     -93.98      <.0001



                                                             The NLIN Procedure

                                                                   Sum of        Mean               Approx
                                 Source                    DF     Squares      Square    F Value    Pr > F

                                 Model                      3      318355      106118    12731.3    <.0001
                                 Error                     96       800.2      8.3353
                                 Corrected Total           99      319156


                                                                      Approx
                                        Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

                                        intcpt0          5.6447       0.6003      4.4530      6.8363
                                        slope1          -2.0169       0.0110     -2.0387     -1.9951
                                        slope2           4.5742       0.6901      3.2043      5.9441
                                        x0              94.3985       0.3816     93.6411     95.1559


                                                      Approximate Correlation Matrix
                                                  intcpt0          slope1          slope2              x0

                                  intcpt0       1.0000000      -0.8683135      -0.0000000      -0.1189745
                                  slope1       -0.8683135       1.0000000       0.0000000       0.2046574
                                  slope2       -0.0000000       0.0000000       1.0000000       0.8511409
                                  x0           -0.1189745       0.2046574       0.8511409       1.0000000
Ksharp
Super User
@Rick_SAS wrote a blog about this topic before.
Rick_SAS
SAS Super FREQ

This is called a piecewise linear model when each regression curve is linear. In general, these are called segmented models. See

PaigeMiller
Diamond | Level 26

Is there an index/list of topics that @Rick_SAS has written about? Or should we just assume that he has written about every possible topic 😁🙀👍 ?

--
Paige Miller
Rick_SAS
SAS Super FREQ

No index, but you can search SAS blogs by using the SITE: keyword in an internet search engine. For example:

piecewise regression site:blogs.sas.com

will locate the two articles that I linked to.

 

PaigeMiller
Diamond | Level 26

Thanks! Actually, Google is almost as good as an index/list.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1750 views
  • 7 likes
  • 4 in conversation