BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi All,

 

I'm working on a piecewise regression (I think of it as spline regression). I've got data that I want to model with two linear splines (i.e. one knot). I'm doing this for multiple samples, and the location of the knot will vary across samples; I would like to find the "best" knot for each sample.

 

As an example, data like:

data foo ;
  call streaminit(12345); 
  do x=1 to 100 ;
    if x<=50 then y= 75 + rand('Normal',0,5) ;
    else y= -.6 * (x-50) + 75 + rand('Normal',0,5) ;
    output ;
  end ;
run ;

proc sgplot data=foo ;
  scatter x=x y=y ;
  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

spline1.PNG

So basically values are fairly stable for the first 50 values, then start to decrease.

 

As I haven't done much of this, my first thought was to parameterize it by hand, and just run PROC REG, which seems to work:

data p1 ;
  set foo ;
  x1=x ;
  if x<=50 then x2=0 ;
  else x2=x-50 ;
run ;

proc reg data=p1 ;
  model y = x1 x2 ;
  output out=pred1 p=predicted ;
run ;
quit ;

proc sgplot data=pred1 ;
  scatter x=x y=y ;
  series x=x y=predicted ;
  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

Which yields:

spline2.PNG

Which I think I'm happy with.  But my guess is there is likely a better way.

 

Questions:

1. If you accept the constraint that I want two linear splines with a knot between them, is there an easier/better way to fit this model without parameterizing it by hand in a sort of dummy-coding way.

 

2. The location of the knot will vary from sample to sample (e.g. might be at x=30 rather than x=50).  I would like to find the "best" knot for each sample.  Any thoughts on finding the best knot?  I was thinking something simple like for each sample, run 99 regressions where the knot location is varied from 1 to 99, then find the regression with best fit (R**2), and call that the best knot for that data.  But I'm sure there's a better way (NLIN?).

 

3. If I relaxed the constraint of two *linear* splines, what sort of spliny regression might you use for data like this, where values of y are roughly stable for an unknown period of time, and then begin to decay at some roughly stable rate?  Across samples the amount of time (x) before decay starts varies, as does the rate of decay.

 

Any general suggestions for good into papers on splines welcome.

 

Thanks,

--Q.

 

1 ACCEPTED SOLUTION
5 REPLIES 5
Quentin
Super User

Yeah, I've read that and some of his other spliny blog posts, but I've been intimidated by them (which is of course due to my own limitations).  Basically "NLIN, why did it have to be NLIN?"  

 

But I agree, this blog post looks like what I want to do.  Another colleague mentioned PROC MODEL.  


Maybe I'll take another stab at learning NLIN, and will post what I come up with here, in hopes that others will point out my mistakes.

 

Thx.

PGStats
Opal | Level 21

Very good idea. You will find that nlin isn't that hard to master and works quite well.

PG
Quentin
Super User

Thanks again @PGStats, PROC NLIN did a really nice job of it.  Took me a little while to work through the docs, and then the highschool algebra (luckily, no calculus needed for my model...)

 

Model is:
  y= C             if x  < x0
  y= B0 + B1 * X   if x >= x0

where
  C is some constant
  x0 is the x value of the knot

We force the two splines to meet, so:
  C = B0 + B1 * x0

therefore nlin models:
  if x < x0 then model y=B0 + B1*x0 
  else           model y=B0 + B1*x  

Code:

proc nlin data=foo ;
  parms x0=15  B0=100 B1=-1;

  if x < x0 then model y=B0 + B1*x0 ;
  else           model y=B0 + B1*x  ;

  output out=pred predicted=pred ;
run ;


proc sgplot data=pred ;
  scatter x=x y=y ;
  series x=x y=pred ;

  xaxis values=(0 to 100 by 10) ;
  yaxis values=(0 to 100 by 10) ;
run ;

Output:

NLINsplines.PNG

 

Ksharp
Super User
Why not try non-parameter method ? Like PROC LOESS
or add more term in model like x1^2 , x2^2

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2959 views
  • 7 likes
  • 3 in conversation