Programming the statistical procedures from SAS

LASSO Cox proportional hazards model

Reply
Contributor
Posts: 44

LASSO Cox proportional hazards model

I understand LASSO model selection can be applied to survival data (Cox proportional hazard models). Please see https://www.ncbi.nlm.nih.gov/pubmed/9044528.

 

Could anyone please suggest some SAS syntax for this? Anything in Proc HPGENSELECT? Thanks.    

Super Contributor
Posts: 287

Re: LASSO Cox proportional hazards model

I have the same need, but came to the conclusion that it is not in SAS (yet).

 

It is possible to run a cox-regression in Proc HPGENSELECT, but it may not work very good: You need to make a aggregated dataset such information on each risk set is collected in the same records (one record for each different combination of covariates and riskset). Then a poisson regression model where with the time-variable included as a class variable and log(number at risk) as offset variable is equivalent to a cox-regression. There is at least to drawback for this method: 1) there can be overwhelming many parameters because each riskset add one parameter to the model, and 2) the time variable that is included in the model statement can give some trouble with the LASSO method (time has to be included and is not allowed to leave the model). If you want to try, you can use the cox-aggregate macro I attached to this article Cox-aggreate,

 

and an example of how to estimate parameters in a cox-regression with HPGENSELECT:

 

data simulation;
  array covariate{10};
  entry=0;
  event=1;
  do i=1 to 1000;
    do k=1 to 10;
      covariate[k]=rand('bernoulli',0.5);
    end;
    rate=exp(-0.5*covariate2-0.1*covariate4+0.1*covariate5+0.8*covariate6);
    t=rand('exponential',1/rate);
    output;
  end;
run;
%coxaggregate(data=simulation,output=dataout,entry=entry,exit=t,event=event,covariate=covariate1+covariate2+covariate3+covariate4+covariate5+covariate6+covariate7+covariate8+covariate9+covariate10,type=logistic);

*(aggregated data may be larger than original data);

data dataout;
  set dataout;
  log_atrisk=log(n);
run;

*In this model you can make some variable selection. But always keep time in the model!; proc hpgenselect data=dataout; class time; model d=covariate1-covariate10 time/dist=poisson link=log offset=log_atrisk; run; *equivalent to use phreg on non-aggreated data; proc phreg data=simulation; model t=covariate1-covariate10; run;
Contributor
Posts: 44

Re: LASSO Cox proportional hazards model

Wow thanks I'll give this a try.

Ask a Question
Discussion stats
  • 2 replies
  • 404 views
  • 1 like
  • 2 in conversation