Solved: FInding the best knot for a linear piecewise regression

Quentin · Posted 03-31-2021 03:00 PM

Hi All,

I'm working on a piecewise regression (I think of it as spline regression). I've got data that I want to model with two linear splines (i.e. one knot). I'm doing this for multiple samples, and the location of the knot will vary across samples; I would like to find the "best" knot for each sample.

As an example, data like:

data foo ;
  call streaminit(12345); 
  do x=1 to 100 ;
    if x<=50 then y= 75 + rand('Normal',0,5) ;
    else y= -.6 * (x-50) + 75 + rand('Normal',0,5) ;
    output ;
  end ;
run ;

proc sgplot data=foo ;
  scatter x=x y=y ;
  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

So basically values are fairly stable for the first 50 values, then start to decrease.

As I haven't done much of this, my first thought was to parameterize it by hand, and just run PROC REG, which seems to work:

data p1 ;
  set foo ;
  x1=x ;
  if x<=50 then x2=0 ;
  else x2=x-50 ;
run ;

proc reg data=p1 ;
  model y = x1 x2 ;
  output out=pred1 p=predicted ;
run ;
quit ;

proc sgplot data=pred1 ;
  scatter x=x y=y ;
  series x=x y=predicted ;
  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

Which yields:

Which I think I'm happy with. But my guess is there is likely a better way.

Questions:

1. If you accept the constraint that I want two linear splines with a knot between them, is there an easier/better way to fit this model without parameterizing it by hand in a sort of dummy-coding way.

2. The location of the knot will vary from sample to sample (e.g. might be at x=30 rather than x=50). I would like to find the "best" knot for each sample. Any thoughts on finding the best knot? I was thinking something simple like for each sample, run 99 regressions where the knot location is varied from 1 to 99, then find the regression with best fit (R**2), and call that the best knot for that data. But I'm sure there's a better way (NLIN?).

3. If I relaxed the constraint of two *linear* splines, what sort of spliny regression might you use for data like this, where values of y are roughly stable for an unknown period of time, and then begin to decay at some roughly stable rate? Across samples the amount of time (x) before decay starts varies, as does the rate of decay.

Any general suggestions for good into papers on splines welcome.

Thanks,

--Q.

PGStats · Posted 03-31-2021 04:33 PM

@Rick_SAS wrote a nice blog entry about this problem

https://blogs.sas.com/content/iml/2020/12/14/segmented-regression-sas.html

PG

View solution in original post

PGStats · Posted 03-31-2021 04:33 PM

@Rick_SAS wrote a nice blog entry about this problem

https://blogs.sas.com/content/iml/2020/12/14/segmented-regression-sas.html

PG

Quentin · Posted 03-31-2021 05:20 PM

Yeah, I've read that and some of his other spliny blog posts, but I've been intimidated by them (which is of course due to my own limitations). Basically "NLIN, why did it have to be NLIN?"

But I agree, this blog post looks like what I want to do. Another colleague mentioned PROC MODEL.

Maybe I'll take another stab at learning NLIN, and will post what I come up with here, in hopes that others will point out my mistakes.

Thx.

PGStats · Posted 03-31-2021 05:36 PM

Very good idea. You will find that nlin isn't that hard to master and works quite well.

PG

Quentin · Posted 04-03-2021 09:01 AM

Thanks again @PGStats, PROC NLIN did a really nice job of it. Took me a little while to work through the docs, and then the highschool algebra (luckily, no calculus needed for my model...)

Model is:
  y= C             if x  < x0
  y= B0 + B1 * X   if x >= x0

where
  C is some constant
  x0 is the x value of the knot

We force the two splines to meet, so:
  C = B0 + B1 * x0

therefore nlin models:
  if x < x0 then model y=B0 + B1*x0 
  else           model y=B0 + B1*x

Code:

proc nlin data=foo ;
  parms x0=15  B0=100 B1=-1;

  if x < x0 then model y=B0 + B1*x0 ;
  else           model y=B0 + B1*x  ;

  output out=pred predicted=pred ;
run ;


proc sgplot data=pred ;
  scatter x=x y=y ;
  series x=x y=pred ;

  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

Output:

Ksharp · Posted 04-01-2021 07:34 AM

Why not try non-parameter method ? Like PROC LOESS
or add more term in model like x1^2 , x2^2

FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

Re: FInding the best knot for a linear piecewise regression

The 2025 SAS Hackathon has begun!