BookmarkSubscribeRSS Feed
TJ87
Obsidian | Level 7

I understand LASSO model selection can be applied to survival data (Cox proportional hazard models). Please see https://www.ncbi.nlm.nih.gov/pubmed/9044528.

 

Could anyone please suggest some SAS syntax for this? Anything in Proc HPGENSELECT? Thanks.    

2 REPLIES 2
JacobSimonsen
Barite | Level 11

I have the same need, but came to the conclusion that it is not in SAS (yet).

 

It is possible to run a cox-regression in Proc HPGENSELECT, but it may not work very good: You need to make a aggregated dataset such information on each risk set is collected in the same records (one record for each different combination of covariates and riskset). Then a poisson regression model where with the time-variable included as a class variable and log(number at risk) as offset variable is equivalent to a cox-regression. There is at least to drawback for this method: 1) there can be overwhelming many parameters because each riskset add one parameter to the model, and 2) the time variable that is included in the model statement can give some trouble with the LASSO method (time has to be included and is not allowed to leave the model). If you want to try, you can use the cox-aggregate macro I attached to this article Cox-aggreate,

 

and an example of how to estimate parameters in a cox-regression with HPGENSELECT:

 

data simulation;
  array covariate{10};
  entry=0;
  event=1;
  do i=1 to 1000;
    do k=1 to 10;
      covariate[k]=rand('bernoulli',0.5);
    end;
    rate=exp(-0.5*covariate2-0.1*covariate4+0.1*covariate5+0.8*covariate6);
    t=rand('exponential',1/rate);
    output;
  end;
run;
%coxaggregate(data=simulation,output=dataout,entry=entry,exit=t,event=event,covariate=covariate1+covariate2+covariate3+covariate4+covariate5+covariate6+covariate7+covariate8+covariate9+covariate10,type=logistic);

*(aggregated data may be larger than original data);

data dataout;
  set dataout;
  log_atrisk=log(n);
run;

*In this model you can make some variable selection. But always keep time in the model!; proc hpgenselect data=dataout; class time; model d=covariate1-covariate10 time/dist=poisson link=log offset=log_atrisk; run; *equivalent to use phreg on non-aggreated data; proc phreg data=simulation; model t=covariate1-covariate10; run;
TJ87
Obsidian | Level 7

Wow thanks I'll give this a try.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 5539 views
  • 1 like
  • 2 in conversation