Hi All,
Using the dataset below, I am trying to find inflection points for surgeon volume in relation to complications. I have read that restricted cubic splines can do such a thing but I am not sure how to approach it in SAS. As my data is, I am assuming I will have to do a spline with a logistic regression (complication = age insurance surg_volume) where surg_volume is a spline. Eventually after establishing these inflection points, I am going to do a separate analysis where I define surgeons as either low volume or high volume based on the spline knots and then run a survival analysis to compare low vs high volume surgeons.
The dataset below is randomly generated but is the same format as my current dataset.
data have;
input complication age insurance $ surg_volume;
;
datalines;
0 84 cigna 11
1 84 aetna 45
0 16 blue 138
1 75 cigna 116
0 46 aetna 134
1 50 blue 118
1 49 cigna 129
1 13 aetna 101
0 43 blue 65
1 32 cigna 14
1 21 aetna 87
0 29 blue 10
0 82 cigna 127
0 16 aetna 61
1 15 blue 21
0 40 cigna 81
0 63 aetna 80
1 69 blue 72
1 21 cigna 27
0 13 aetna 84
0 26 blue 7
0 46 cigna 64
0 35 aetna 10
0 18 blue 75
0 18 cigna 19
0 15 aetna 111
1 36 blue 16
1 16 cigna 130
1 86 aetna 56
0 44 blue 19
0 79 cigna 120
1 29 aetna 70
0 52 blue 94
1 37 cigna 26
0 67 aetna 33
1 49 blue 61
1 31 cigna 54
1 20 aetna 81
0 31 blue 79
1 63 cigna 91
0 50 aetna 131
0 55 blue 18
0 66 cigna 3
0 62 aetna 17
0 79 blue 124
0 82 cigna 21
1 81 aetna 48
1 59 blue 103
0 70 cigna 138
1 19 aetna 64
1 63 blue 147
1 36 cigna 17
1 87 aetna 102
0 63 blue 60
0 18 cigna 114
0 31 aetna 124
1 37 blue 67
0 12 cigna 149
0 42 aetna 95
1 74 blue 118
0 75 cigna 58
0 19 aetna 111
1 31 blue 113
1 26 cigna 53
0 20 aetna 140
0 66 blue 38
1 54 cigna 60
1 47 aetna 135
1 79 blue 121
0 31 cigna 82
1 80 aetna 40
1 59 blue 79
1 18 cigna 87
0 34 aetna 111
1 77 blue 65
1 14 cigna 54
1 59 aetna 39
1 61 blue 48
1 32 cigna 137
1 28 aetna 144
0 21 blue 120
1 17 cigna 58
0 55 aetna 145
1 56 blue 75
0 69 cigna 119
1 15 aetna 105
1 17 blue 130
0 84 cigna 17
0 20 aetna 75
0 75 blue 8
0 84 cigna 101
0 44 aetna 100
1 26 blue 133
0 13 cigna 108
0 61 aetna 85
1 21 blue 119
0 15 cigna 7
0 42 aetna 48
1 75 blue 108
0 70 cigna 13
0 51 aetna 150
1 72 blue 145
0 19 cigna 132
1 81 aetna 32
0 36 blue 134
1 36 cigna 110
0 48 aetna 102
0 87 blue 121
0 65 cigna 64
1 34 aetna 96
1 52 blue 119
1 40 cigna 75
0 32 aetna 92
0 19 blue 56
0 13 cigna 128
0 43 aetna 70
1 75 blue 102
1 81 cigna 134
0 17 aetna 20
1 68 blue 3
1 85 cigna 139
0 87 aetna 32
1 72 blue 135
1 30 cigna 113
1 43 aetna 113
1 57 blue 118
0 74 cigna 2
0 79 aetna 23
0 81 blue 42
1 49 cigna 125
0 65 aetna 76
1 59 blue 145
0 26 cigna 112
0 84 aetna 33
1 23 blue 100
1 40 cigna 124
0 35 aetna 98
1 60 blue 102
0 12 cigna 137
1 70 aetna 78
1 83 blue 11
1 60 cigna 120
1 87 aetna 97
0 87 blue 34
1 48 cigna 2
0 14 aetna 23
0 74 blue 41
0 48 cigna 119
1 51 aetna 6
0 78 blue 37
1 42 cigna 63
1 13 aetna 141
1 57 blue 96
0 76 cigna 107
0 66 aetna 30
0 43 blue 94
1 61 cigna 148
0 16 aetna 7
1 41 blue 90
1 56 cigna 117
1 73 aetna 15
1 66 blue 31
0 37 cigna 34
0 22 aetna 16
1 59 blue 68
;
You can use the EFFECT statement to create the spline and use the EFFECTPLOT statement to visualize it. Try this:
proc logistic data=Have;
class insurance;
effect spl = spline(surg_volume/ details naturalcubic basis=tpf(noint)
knotmethod=percentiles(5));
/* or in SAS 9.4M6: knotmethod=percentilelist(5 27.5 50 72.5 95) ); */
model complication(event='1') = age insurance spl;
effectplot slicefit(x=surg_volume sliceby=insurance) / obs;
run;
Some articles that explain the various techniques:
Thank you for your reply - it is extremely helpful. Your suggested code is similar to what I was previously using. Perhaps what has been confusing to me is how folks are using restricted cubic splines to identify inflection points. Using KNOTMETHOD=PERCENTILES(5), this should place the knots at the equal percentiles but not necessarily identify the inflection points. Do you have any advice on how to identify the inflection points?
I could also convert to a complication rate for each surg_volume to make a continuous dependent variable if that makes the analysis easier.
> Do you have any advice on how to identify the inflection points?
I do not have any good advice. In the sample code I posted, you can estimate the "elbow" from the graph of the effect plot. Unfortunately, when you have other covariates such as age and insurance, there is not likely to be "THE" inflection point. The location of the elbow could depend on other factors. I would also guess that some doctors (due to temperament and experience) are better at handling high volumes than others.
If you want the location of the elbow to be estimated by the model, you need to include it as a parameter in the model, which leads to piecewise regression models (link in my previous reply).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.