BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gbrussell
Calcite | Level 5

Hi -- I asked this yesterday, but I want to frame it differently today.  

 

I'd like to find the plateau point without date -- just surgery number.  The sas file is attached, and the data are simple.  25 surgeries, numbered 1 to 25.  I used a pbspline, but it certainly looks like no plateau.  If so, that's fine, but I need someone who knows this better than me to say so.  I appreciate any help you can lend.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

So here is another way, that uses a segmented quadratic model.

 

Code:

 

title 'Quadratic Model with Plateau';
proc nlin data=have;
   parms alpha=100 beta=-1 gamma=0.0033333;

   x0 = -.5*beta / gamma;

   if (surgeryno < x0) then
        mean = alpha + beta*surgeryno  + gamma*surgeryno*surgeryno;
   else mean = alpha + beta*x0 + gamma*x0*x0;
   model duration = mean;

   if _obs_=1 and _iter_ =.  then do;
      plateau =alpha + beta*x0 + gamma*x0*x0;
      put /  x0= plateau=  ;
   end;
   output out=b predicted=yp;
run;

proc sgplot data=b;
scatter x=surgeryno y=duration;
pbspline x=surgeryno y=yp;
run;

Graphic output:

 

SteveDenham_0-1606924932731.png

 

NLIN output:

SteveDenham_1-1606925005450.pngSteveDenham_2-1606925045579.png

 

PUT statement output:

x0=15.170935313 plateau=97.919504645

 

with some strong caveats then you could say that the plateau of 98 starts at surgery number=15.  The caveat here is that the correlations between the parameters in the model are really large, so the calculated x0 value is going to depend on picking out good starting values.  Also, the quadratic coefficient (gamma) has confidence bounds that contain 0, implying that it could be removed.  If that is the case, there is no plateau, just a consistent decrease.

 

SteveDenham

 

View solution in original post

11 REPLIES 11
gbrussell
Calcite | Level 5

Here are the data -- apologize for the simple posting style.

 

data surgery;
input surgeryno duration;
cards;
1 182
2 150
3 223
4 142
5 111
6 164
7 83
8 144
9 162
10 83
11 70
12 114
13 113
14 97
15 83
16 111
17 73
18 87
19 86
20 102
21 124
22 95
23 134
24 121
25 60
;


proc sgplot data=surgery;
scatter x=surgeryno y=duration;
run;

PGStats
Opal | Level 21

Same robust approach as before yields a minimum this time:

 

proc quantreg data=have ci=resampling outest=est;
effect sPoly=poly(surgeryNo/degree=2 standardize=none details);
model duration = sPoly;
output out=preds p=sDuration;
run;

proc sql;
select 
    -surgeryNo / (2*'surgeryNo^2'n) as minDurSurgery "Surgery of minimum duration",
    Intercept + SurgeryNo*calculated minDurSurgery + 
        'surgeryNo^2'n * (calculated minDurSurgery)**2 "Minimum predicted duration"
from est;
quit;

PGStats_0-1606884030645.png

 

PG
gbrussell
Calcite | Level 5
Thank you -- your and Steve's help have been very much appreciated.
Reeza
Super User

You're looking for the CDF - the cumulative distribution function to see if there's a plateau. I don't think you have enough data at this point to see it. Usually according to a central limit theorem you'd need about 25 data points but in your case the distribution of the data hasn't stabilized. You'll need more observations to get an answer. You're looking for where the curve plateaus on the CDF graph.

 

data have;
infile cards dlm='09'x;
  input SurgeryNo date :mmddyy10. duration;
  format date mmddyy10.;
datalines;
1	3/20/2019	182
2	5/16/2019	150
3	5/30/2019	223
4	6/6/2019	142
5	6/11/2019	111
6	7/11/2019	164
7	7/26/2019	83
8	8/22/2019	144
9	8/29/2019	162
10	9/19/2019	83
11	10/10/2019	70
12	10/17/2019	114
13	10/31/2019	113
14	11/7/2019	97
15	11/21/2019	83
16	12/5/2019	111
17	12/5/2019	73
18	12/12/2019	87
19	12/19/2019	86
20	1/9/2020	102
21	1/16/2020	124
22	1/23/2020	95
23	1/30/2020	134
24	3/5/2020	121
25	6/4/2020	60
;

proc univariate data=have;
cdf duration;
histogram duration;
run;

@gbrussell wrote:

Hi -- I asked this yesterday, but I want to frame it differently today.  

 

I'd like to find the plateau point without date -- just surgery number.  The sas file is attached, and the data are simple.  25 surgeries, numbered 1 to 25.  I used a pbspline, but it certainly looks like no plateau.  If so, that's fine, but I need someone who knows this better than me to say so.  I appreciate any help you can lend.

 

 

 


 

SteveDenham
Jade | Level 19

Thin plate spline and loess both show a breakpoint at about surgeryNo = 15. @PGStats method showed a minimum to the quadratic at the same point.  To get what I did, simply replace date with SurgeryNo in the code I shared yesterday.  Note that the penalized B spline now tries to fit every point and results in a plot that looks like you threw a strand of spaghetti on the table.  Also, removing the last surgery has little effect. I have yet to try @Reeza 's approach, but will check back in after I give it a whirl.

 

SteveDenham

Reeza
Super User
To me, this question is still ambiguous. I don't know whether the physician is interested in what the expected duration of the procedure will be (plateau of time) or how many procedures it takes to become proficient at a task (# of procedures).

My answer is for the first question - what is the expected time most likely.
Steve's answer is for the second question - how long does it take to become proficient at a task.

OP, can you clarify which business question the physician is trying to answer? Or are both of these off the mark?
PS. This is why it's better to explain the logic of your question. Right now we're very much dealing with an XY problem.
https://xyproblem.info/
gbrussell
Calcite | Level 5
That's a great question and I apologize for lack of clarity / thoroughness. The question is how many procedures are needed to become proficient. The surgeon was trying to assess how many he needed to do before his times hit a plateau.
SteveDenham
Jade | Level 19

So here is another way, that uses a segmented quadratic model.

 

Code:

 

title 'Quadratic Model with Plateau';
proc nlin data=have;
   parms alpha=100 beta=-1 gamma=0.0033333;

   x0 = -.5*beta / gamma;

   if (surgeryno < x0) then
        mean = alpha + beta*surgeryno  + gamma*surgeryno*surgeryno;
   else mean = alpha + beta*x0 + gamma*x0*x0;
   model duration = mean;

   if _obs_=1 and _iter_ =.  then do;
      plateau =alpha + beta*x0 + gamma*x0*x0;
      put /  x0= plateau=  ;
   end;
   output out=b predicted=yp;
run;

proc sgplot data=b;
scatter x=surgeryno y=duration;
pbspline x=surgeryno y=yp;
run;

Graphic output:

 

SteveDenham_0-1606924932731.png

 

NLIN output:

SteveDenham_1-1606925005450.pngSteveDenham_2-1606925045579.png

 

PUT statement output:

x0=15.170935313 plateau=97.919504645

 

with some strong caveats then you could say that the plateau of 98 starts at surgery number=15.  The caveat here is that the correlations between the parameters in the model are really large, so the calculated x0 value is going to depend on picking out good starting values.  Also, the quadratic coefficient (gamma) has confidence bounds that contain 0, implying that it could be removed.  If that is the case, there is no plateau, just a consistent decrease.

 

SteveDenham

 

Reeza
Super User
In reality you would likely also stratify by the complexity of the cases or other variables as required (different clinics may have different equipment, different staff) and would require more data (not just one individual) to make a robust recommendation like this 🙂

gbrussell
Calcite | Level 5
Steve --
This was my original approach, but I could never decide what were good starting values. Could you share insight into how you narrowed that down? I tried a grid approach and never really found anything that made much sense.
Thank you again!
Greg
SteveDenham
Jade | Level 19

Well, I knew that beta had to be negative, and that, based on the spline plots, the join point was around 15.  So I just used -1 for beta, and solved for gamma, given x0 was 15.  To get alpha, I picked something near the median value of all the points.  A little sensitivity analysis on alpha revealed that the final estimates were stable across all the values I tried.

 

I guess the key is using the plots to get an approximate value for x0, so that the ratio between beta and gamma was about twice that value.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 831 views
  • 2 likes
  • 4 in conversation