BookmarkSubscribeRSS Feed
Quartz | Level 8

Hi everyone,


I would like to identify biomarker threshold that can be used for survival prognosis. So basically I have a multivariate cox regression with a continuous variable that represents biomarker expression. I would like to identify the level of expression that can effect survival.


Is there any procedure or macro that helps with this?

Thanks in forward




I have moved this topic to 'Statistical Procedures' board as it is about survival analysis and PROC PHREG.

Or did you use another procedure than PROC PHREG?

Or did you use PROC LIFEREG?

Or did you use PROC LOGISTIC (discrete-time logistic hazard model)?




Quartz | Level 8

I am asking for PROC PHREG indeed. I introduced the biomarker expression as continuous variable in the Cox model.




I have a follow-up question.

Why do you want to find a threshold?


Let me guess : 

You want to find a threshold for biomarker expression to make a new risk factor X.

The dichotomous risk factor variable X 

  • takes the value 1 if the biomarker expression is equal or above the threshold and
  • takes the value 0 if the risk factor is below the threshold.


You want to find the threshold that maximizes the hazard ratio for the main effect X. 





Hello @Ubai ,


You say : I introduced the biomarker expression as continuous variable in the Cox model.

Do you have any other explanatory variables as well?
Things become more complicated if your biomarker expression is interacting with other explanatory variables.


But, supposing you have NO other explanatory variables (or you have them, but biomarker expression is only a main effect and not involved in any interaction) :


A good threshold can be "guessed" from your (continuous) biomarker expression effect on the survival rate. You might need to build a spline effect with it (or another transformed feature), otherwise you cannot judge well if the odds (ratio) stays constant over the whole "profile".


Another solution, the easiest one, is to do a grid search.
This solution is very greedy and not intelligent !!
You just try 20 (or XX) thresholds to find out about the best one.

It's a mere loop over 20 (or XX) possibilities followed by comparison of the 20 (or XX) results. To be built inside a macro or via data-driven code generation!

The last possibility is an intelligent search for the best threshold.
But that can be mathematically cumbersome. You need to write an objective function that you can then maximize subject to constraints. You need SAS/OR or SAS Optimization for that (PROC OPTMODEL or PROC OPTLSO).
LSO = Local Search Optimization (with GA = Genetic Algorithms is sometimes easier).

Kind regards,Koen

Quartz | Level 8

Hi @sbxkoenk,


thanks for the detailed answer. I do have a fully adjusted Cox model. All established factors associated with survival were included in the model. I have prepared a DAG diagram, and I think it is plausible to assume that the biomarker expression has a main effect on survival and is not interacting with other explanatory variables such as treatment.


My plan was to plot smooth hazards ratio using spline effects and try to guess the threshold from this. However, my sample size is relatively small ~ 100 patients.


Yes, 100 patients is not that much.


I actually never do survival analysis on living organisms ( patients / animals / plants ).
I only do it on things (like machines or machine parts). Never problems with small datasets there 😁.


I would try it anyway with that spline effect. Maybe you see a kink in the curve somewhere.


Good luck,



Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2 in conversation