I have data on surgery number (1-25) and surgery minutes by the same physician; he wants to know when his data plateaus. I've seen the example (https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...) but am not sure of the parameters I need to specify for this. Any help would be welcomed. I'd like to plot the data; here's what the data look like (surgery #, date, length of procedure):
1 | 3/20/2019 | 182 |
2 | 5/16/2019 | 150 |
3 | 5/30/2019 | 223 |
4 | 6/6/2019 | 142 |
5 | 6/11/2019 | 111 |
6 | 7/11/2019 | 164 |
7 | 7/26/2019 | 83 |
8 | 8/22/2019 | 144 |
9 | 8/29/2019 | 162 |
10 | 9/19/2019 | 83 |
11 | 10/10/2019 | 70 |
12 | 10/17/2019 | 114 |
13 | 10/31/2019 | 113 |
14 | 11/7/2019 | 97 |
15 | 11/21/2019 | 83 |
16 | 12/5/2019 | 111 |
17 | 12/5/2019 | 73 |
18 | 12/12/2019 | 87 |
19 | 12/19/2019 | 86 |
20 | 1/9/2020 | 102 |
21 | 1/16/2020 | 124 |
22 | 1/23/2020 | 95 |
23 | 1/30/2020 | 134 |
24 | 3/5/2020 | 121 |
25 | 6/4/2020 | 60 |
Here's an example of 1) how to provide example data and 2) a simple plot of the data
data have; input SurgeryNo date :mmddyy10. duration; format date mmddyy10.; datalines; 1 3/20/2019 182 2 5/16/2019 150 3 5/30/2019 223 4 6/6/2019 142 5 6/11/2019 111 6 7/11/2019 164 7 7/26/2019 83 8 8/22/2019 144 9 8/29/2019 162 10 9/19/2019 83 11 10/10/2019 70 12 10/17/2019 114 13 10/31/2019 113 14 11/7/2019 97 15 11/21/2019 83 16 12/5/2019 111 17 12/5/2019 73 18 12/12/2019 87 19 12/19/2019 86 20 1/9/2020 102 21 1/16/2020 124 22 1/23/2020 95 23 1/30/2020 134 24 3/5/2020 121 25 6/4/2020 60 ; proc sgplot data=have; scatter x=date y=duration; run;
You might add a regression of degree 2,
proc sgplot data=have; scatter x=date y=duration; reg x=date y=duration/degree=2 ; run;
If you want to fit a model then implement the document you site. Maybe.
The model doesn't work for these data, at least not using the alpha, beta, gamma, X0 in the link. That's where I need a bit of help.
Thanks!
Thank you for the plot/model -- it works very well and will help provide what I need to report. I appreciate your time.
If you drop the minimum and maximum durations in this data, you no longer see a "plateau". The plot and regression curve changes from
to
A robust regression (fitting at the median) doesn't show much evidence of a time effect:
proc quantreg data=have ci=resampling;
effect dPoly=poly(date / degree=2 standardize=center details);
model duration = dPoly;
run;
most of the effect that we see in the scatter plot is due to 2 or 3 data points.
I tried a variety of splines (penalized B, thin-plate and loess) to see if there was any indication of something that might look like a plateau, without using a polynomial fit. The penalized B gave something that looked like this:
The other methods showed a smooth decrease. Removing the last date from the penalized B gives something that turns back up (no plateau), while the other two methods give;
Loess:
Thin plate spline:
These give some support to a plateau being reached some time in November 2019. That last point has a tremendous amount of leverage, and really shifts the resulting plots. Code for these follows:
proc sgplot data=have;
scatter x=date y=duration;
pbspline x=date y=duration;
run;
ods graphics on;
proc tpspline data=have plots(only)=(criterionplot fitplot(clm));
model duration = (date) /alpha = 0.1;
output out = result pred uclm lclm;
run;
proc loess data=have;
model duration=date;
run;
/* Delete the last point */
data have2;
set have;
if surgeryno=25 then delete;
run;
proc sgplot data=have2;
scatter x=date y=duration;
pbspline x=date y=duration;
run;
proc tpspline data=have2 plots(only)=(criterionplot fitplot(clm));
model duration = (date) /alpha = 0.1;
output out = result pred uclm lclm;
run;
proc loess data=have2;
model duration=date;
run;
SteveDenham
Thank you very much for your time. This is quite helpful and thorough.
> The model doesn't work for these data, at least not using the alpha, beta, gamma, X0 in the link.
It is not surprising that the initial guesses for the documentation example do not work for your data, which has a completely different scale. The initial values of the parameters should be somewhat close to the final (optimal) values. For your data, I would use a simple quadratic regression model to estimate the parameters. Then the NLIN Procedure fits the data without difficulty and reports a plateau of 98 minutes, reached on 01JAN2020.
/* rescale by using x="days since start" as the variable */
%let RefDate = '20MAR2019'd;
data New;
set Have;
rename duration=y;
x = date - &RefDate; /* days since first record */
run;
proc glm data=New;
model y = x x*x;
run;
title 'Quadratic Model with Plateau';
proc nlin data=New plots=fit;
parms alpha=184 beta= -0.5 gamma= 0.001;
x0 = -.5*beta / gamma;
if (x < x0) then
mean = alpha + beta*x + gamma*x*x;
else mean = alpha + beta*x0 + gamma*x0*x0;
model y = mean;
estimate 'plateau' alpha + beta*x0 + gamma*x0*x0;
estimate 'x0' -beta / (2*gamma);
output out=b predicted=yp L95M=lower U95M=upper;
run;
/* convert x back to original scale (date) */
data _NULL_;
plateauDate = 287 + &RefDate;
call symput('x0', plateauDate);
put plateauDate DATE10.;
run;
%put &=x0;
proc sgplot data=b noautolegend;
band x=date lower=lower upper=upper;
refline 97.97 / axis=y label="Plateau" labelpos=max;
refline &x0 / axis=x label="01JAN2020" labelpos=max;
scatter x=date y=y;
series x=date y=yp;
yaxis label='Duration';
run;
Thank you for the post. I could not find an example or documentation on selecting a better value. I appreciate your time.
If you want more information about how to select initial values, try these:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.