Hi All,
I'm working on a piecewise regression (I think of it as spline regression). I've got data that I want to model with two linear splines (i.e. one knot). I'm doing this for multiple samples, and the location of the knot will vary across samples; I would like to find the "best" knot for each sample.
As an example, data like:
data foo ;
call streaminit(12345);
do x=1 to 100 ;
if x<=50 then y= 75 + rand('Normal',0,5) ;
else y= -.6 * (x-50) + 75 + rand('Normal',0,5) ;
output ;
end ;
run ;
proc sgplot data=foo ;
scatter x=x y=y ;
xaxis values=(0 to 100 by 10) ;
yaxis values=(0 to 100 by 10) ;
run ;
So basically values are fairly stable for the first 50 values, then start to decrease.
As I haven't done much of this, my first thought was to parameterize it by hand, and just run PROC REG, which seems to work:
data p1 ;
set foo ;
x1=x ;
if x<=50 then x2=0 ;
else x2=x-50 ;
run ;
proc reg data=p1 ;
model y = x1 x2 ;
output out=pred1 p=predicted ;
run ;
quit ;
proc sgplot data=pred1 ;
scatter x=x y=y ;
series x=x y=predicted ;
xaxis values=(0 to 100 by 10) ;
yaxis values=(0 to 100 by 10) ;
run ;
Which yields:
Which I think I'm happy with. But my guess is there is likely a better way.
Questions:
1. If you accept the constraint that I want two linear splines with a knot between them, is there an easier/better way to fit this model without parameterizing it by hand in a sort of dummy-coding way.
2. The location of the knot will vary from sample to sample (e.g. might be at x=30 rather than x=50). I would like to find the "best" knot for each sample. Any thoughts on finding the best knot? I was thinking something simple like for each sample, run 99 regressions where the knot location is varied from 1 to 99, then find the regression with best fit (R**2), and call that the best knot for that data. But I'm sure there's a better way (NLIN?).
3. If I relaxed the constraint of two *linear* splines, what sort of spliny regression might you use for data like this, where values of y are roughly stable for an unknown period of time, and then begin to decay at some roughly stable rate? Across samples the amount of time (x) before decay starts varies, as does the rate of decay.
Any general suggestions for good into papers on splines welcome.
Thanks,
--Q.
@Rick_SAS wrote a nice blog entry about this problem
https://blogs.sas.com/content/iml/2020/12/14/segmented-regression-sas.html
@Rick_SAS wrote a nice blog entry about this problem
https://blogs.sas.com/content/iml/2020/12/14/segmented-regression-sas.html
Yeah, I've read that and some of his other spliny blog posts, but I've been intimidated by them (which is of course due to my own limitations). Basically "NLIN, why did it have to be NLIN?"
But I agree, this blog post looks like what I want to do. Another colleague mentioned PROC MODEL.
Maybe I'll take another stab at learning NLIN, and will post what I come up with here, in hopes that others will point out my mistakes.
Thx.
Very good idea. You will find that nlin isn't that hard to master and works quite well.
Thanks again @PGStats, PROC NLIN did a really nice job of it. Took me a little while to work through the docs, and then the highschool algebra (luckily, no calculus needed for my model...)
Model is: y= C if x < x0 y= B0 + B1 * X if x >= x0 where C is some constant x0 is the x value of the knot We force the two splines to meet, so: C = B0 + B1 * x0 therefore nlin models: if x < x0 then model y=B0 + B1*x0 else model y=B0 + B1*x
Code:
proc nlin data=foo ;
parms x0=15 B0=100 B1=-1;
if x < x0 then model y=B0 + B1*x0 ;
else model y=B0 + B1*x ;
output out=pred predicted=pred ;
run ;
proc sgplot data=pred ;
scatter x=x y=y ;
series x=x y=pred ;
xaxis values=(0 to 100 by 10) ;
yaxis values=(0 to 100 by 10) ;
run ;
Output:
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.