Hi -- I asked this yesterday, but I want to frame it differently today.
I'd like to find the plateau point without date -- just surgery number. The sas file is attached, and the data are simple. 25 surgeries, numbered 1 to 25. I used a pbspline, but it certainly looks like no plateau. If so, that's fine, but I need someone who knows this better than me to say so. I appreciate any help you can lend.
So here is another way, that uses a segmented quadratic model.
Code:
title 'Quadratic Model with Plateau';
proc nlin data=have;
parms alpha=100 beta=-1 gamma=0.0033333;
x0 = -.5*beta / gamma;
if (surgeryno < x0) then
mean = alpha + beta*surgeryno + gamma*surgeryno*surgeryno;
else mean = alpha + beta*x0 + gamma*x0*x0;
model duration = mean;
if _obs_=1 and _iter_ =. then do;
plateau =alpha + beta*x0 + gamma*x0*x0;
put / x0= plateau= ;
end;
output out=b predicted=yp;
run;
proc sgplot data=b;
scatter x=surgeryno y=duration;
pbspline x=surgeryno y=yp;
run;
Graphic output:
NLIN output:
PUT statement output:
x0=15.170935313 plateau=97.919504645
with some strong caveats then you could say that the plateau of 98 starts at surgery number=15. The caveat here is that the correlations between the parameters in the model are really large, so the calculated x0 value is going to depend on picking out good starting values. Also, the quadratic coefficient (gamma) has confidence bounds that contain 0, implying that it could be removed. If that is the case, there is no plateau, just a consistent decrease.
SteveDenham
Here are the data -- apologize for the simple posting style.
data surgery;
input surgeryno duration;
cards;
1 182
2 150
3 223
4 142
5 111
6 164
7 83
8 144
9 162
10 83
11 70
12 114
13 113
14 97
15 83
16 111
17 73
18 87
19 86
20 102
21 124
22 95
23 134
24 121
25 60
;
proc sgplot data=surgery;
scatter x=surgeryno y=duration;
run;
Same robust approach as before yields a minimum this time:
proc quantreg data=have ci=resampling outest=est;
effect sPoly=poly(surgeryNo/degree=2 standardize=none details);
model duration = sPoly;
output out=preds p=sDuration;
run;
proc sql;
select
-surgeryNo / (2*'surgeryNo^2'n) as minDurSurgery "Surgery of minimum duration",
Intercept + SurgeryNo*calculated minDurSurgery +
'surgeryNo^2'n * (calculated minDurSurgery)**2 "Minimum predicted duration"
from est;
quit;
You're looking for the CDF - the cumulative distribution function to see if there's a plateau. I don't think you have enough data at this point to see it. Usually according to a central limit theorem you'd need about 25 data points but in your case the distribution of the data hasn't stabilized. You'll need more observations to get an answer. You're looking for where the curve plateaus on the CDF graph.
data have;
infile cards dlm='09'x;
input SurgeryNo date :mmddyy10. duration;
format date mmddyy10.;
datalines;
1 3/20/2019 182
2 5/16/2019 150
3 5/30/2019 223
4 6/6/2019 142
5 6/11/2019 111
6 7/11/2019 164
7 7/26/2019 83
8 8/22/2019 144
9 8/29/2019 162
10 9/19/2019 83
11 10/10/2019 70
12 10/17/2019 114
13 10/31/2019 113
14 11/7/2019 97
15 11/21/2019 83
16 12/5/2019 111
17 12/5/2019 73
18 12/12/2019 87
19 12/19/2019 86
20 1/9/2020 102
21 1/16/2020 124
22 1/23/2020 95
23 1/30/2020 134
24 3/5/2020 121
25 6/4/2020 60
;
proc univariate data=have;
cdf duration;
histogram duration;
run;
@gbrussell wrote:
Hi -- I asked this yesterday, but I want to frame it differently today.
I'd like to find the plateau point without date -- just surgery number. The sas file is attached, and the data are simple. 25 surgeries, numbered 1 to 25. I used a pbspline, but it certainly looks like no plateau. If so, that's fine, but I need someone who knows this better than me to say so. I appreciate any help you can lend.
Thin plate spline and loess both show a breakpoint at about surgeryNo = 15. @PGStats method showed a minimum to the quadratic at the same point. To get what I did, simply replace date with SurgeryNo in the code I shared yesterday. Note that the penalized B spline now tries to fit every point and results in a plot that looks like you threw a strand of spaghetti on the table. Also, removing the last surgery has little effect. I have yet to try @Reeza 's approach, but will check back in after I give it a whirl.
SteveDenham
So here is another way, that uses a segmented quadratic model.
Code:
title 'Quadratic Model with Plateau';
proc nlin data=have;
parms alpha=100 beta=-1 gamma=0.0033333;
x0 = -.5*beta / gamma;
if (surgeryno < x0) then
mean = alpha + beta*surgeryno + gamma*surgeryno*surgeryno;
else mean = alpha + beta*x0 + gamma*x0*x0;
model duration = mean;
if _obs_=1 and _iter_ =. then do;
plateau =alpha + beta*x0 + gamma*x0*x0;
put / x0= plateau= ;
end;
output out=b predicted=yp;
run;
proc sgplot data=b;
scatter x=surgeryno y=duration;
pbspline x=surgeryno y=yp;
run;
Graphic output:
NLIN output:
PUT statement output:
x0=15.170935313 plateau=97.919504645
with some strong caveats then you could say that the plateau of 98 starts at surgery number=15. The caveat here is that the correlations between the parameters in the model are really large, so the calculated x0 value is going to depend on picking out good starting values. Also, the quadratic coefficient (gamma) has confidence bounds that contain 0, implying that it could be removed. If that is the case, there is no plateau, just a consistent decrease.
SteveDenham
Well, I knew that beta had to be negative, and that, based on the spline plots, the join point was around 15. So I just used -1 for beta, and solved for gamma, given x0 was 15. To get alpha, I picked something near the median value of all the points. A little sensitivity analysis on alpha revealed that the final estimates were stable across all the values I tried.
I guess the key is using the plots to get an approximate value for x0, so that the ratio between beta and gamma was about twice that value.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.