BookmarkSubscribeRSS Feed
Ubai
Quartz | Level 8

Hi everyone,

 

I would like to identify biomarker threshold that can be used for survival prognosis. So basically I have a multivariate cox regression with a continuous variable that represents biomarker expression. I would like to identify the level of expression that can effect survival.

 

Is there any procedure or macro that helps with this?

Thanks in forward

7 REPLIES 7
sbxkoenk
SAS Super FREQ

Hello,

 

I have moved this topic to 'Statistical Procedures' board as it is about survival analysis and PROC PHREG.

Or did you use another procedure than PROC PHREG?

Or did you use PROC LIFEREG?

Or did you use PROC LOGISTIC (discrete-time logistic hazard model)?

 

Thanks,

Koen

Ubai
Quartz | Level 8

I am asking for PROC PHREG indeed. I introduced the biomarker expression as continuous variable in the Cox model.

sbxkoenk
SAS Super FREQ

Hello,

 

I have a follow-up question.

Why do you want to find a threshold?

 

Let me guess : 

You want to find a threshold for biomarker expression to make a new risk factor X.

The dichotomous risk factor variable X 

  • takes the value 1 if the biomarker expression is equal or above the threshold and
  • takes the value 0 if the risk factor is below the threshold.

 

You want to find the threshold that maximizes the hazard ratio for the main effect X. 

Correct?

 

Koen

sbxkoenk
SAS Super FREQ

Hello @Ubai ,

 

You say : I introduced the biomarker expression as continuous variable in the Cox model.

Do you have any other explanatory variables as well?
Things become more complicated if your biomarker expression is interacting with other explanatory variables.

 

But, supposing you have NO other explanatory variables (or you have them, but biomarker expression is only a main effect and not involved in any interaction) :

 

A good threshold can be "guessed" from your (continuous) biomarker expression effect on the survival rate. You might need to build a spline effect with it (or another transformed feature), otherwise you cannot judge well if the odds (ratio) stays constant over the whole "profile".

 

Another solution, the easiest one, is to do a grid search.
This solution is very greedy and not intelligent !!
You just try 20 (or XX) thresholds to find out about the best one.

It's a mere loop over 20 (or XX) possibilities followed by comparison of the 20 (or XX) results. To be built inside a macro or via data-driven code generation!

The last possibility is an intelligent search for the best threshold.
But that can be mathematically cumbersome. You need to write an objective function that you can then maximize subject to constraints. You need SAS/OR or SAS Optimization for that (PROC OPTMODEL or PROC OPTLSO).
LSO = Local Search Optimization (with GA = Genetic Algorithms is sometimes easier).

Kind regards,Koen

 
Ubai
Quartz | Level 8

Hi @sbxkoenk,

 

thanks for the detailed answer. I do have a fully adjusted Cox model. All established factors associated with survival were included in the model. I have prepared a DAG diagram, and I think it is plausible to assume that the biomarker expression has a main effect on survival and is not interacting with other explanatory variables such as treatment.

 

My plan was to plot smooth hazards ratio using spline effects and try to guess the threshold from this. However, my sample size is relatively small ~ 100 patients.

sbxkoenk
SAS Super FREQ

Yes, 100 patients is not that much.

 

I actually never do survival analysis on living organisms ( patients / animals / plants ).
I only do it on things (like machines or machine parts). Never problems with small datasets there 😁.

 

I would try it anyway with that spline effect. Maybe you see a kink in the curve somewhere.

 

Good luck,

Koen

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1253 views
  • 0 likes
  • 2 in conversation