BookmarkSubscribeRSS Feed
deschue
Calcite | Level 5

I have inherited some SAS code that uses PROC LIFEREG with inest= that scores data using a survival model, and I need to convert this to an algorithm that can create the scores for records without using SAS (just using SQL). 

proc lifereg data = indata inest=in_coeff  noprint ;
   model days_til_lapse * censor_flag2 (0) =
   var1 var2
   /dist =
           Lognormal maxiter=0   ;
   output out = score quantile = .49 p=median std=s ;
 run;

The in_coeff dataset contains 1 record with intercept, var1, var2, X_scale_, x_Dist, etc. 

  • intercept = 3.965421927
  • var1 = 0.0604628213011544
  • var2 = 0.0903378482875944

I can see in the SAS code that the only thing that is used out of the SCORE dataset is the ID of the records and MEDIAN, where median is the output score that is used as the predictive value for each of the records.

 

I attempted to create an algorithm like I would do in PROC SCORE to re-create the predicted 'median' that is output using:

 pred_median = 3.9654219270.0604628213011544 * var1 + 0.0903378482875944 * var2

This doesn't seem to be giving me the desired outcome. 

 

Can this be done? 

I have not used Proc Lifereg before, so I'm not certain that this algorithm can be applied in the above manner to score all records, if I just need to do some additional conversion to the PRED_MEDIAN value that I calculated (based on the distribution from the Lifereg), or if this cannot be reproduced in basic SQL logic without using SAS or some other statistical software.

 

Thanks!

3 REPLIES 3
Reeza
Super User
How is your equation accounting for the lognormal distribution? Did you convert the parameters? For things like this, I'll usually use PROC PLM or PROC SCORE or the SCORE or CODE statement within the procedures itself to score new data and then calculate the median off the scored data.
deschue
Calcite | Level 5

I have not done any conversion of my parameters yet.  So are you saying that I need to convert the "input" independant variables (var1 and var 2) before running it thru this algorithm?  I was thinking that I might need to convert the "output" value (pred_median) after running thru this algorithm, but I wasn't sure.  

 

From what I read, the 'inest=' options is for providing initial estimate for the model so it is more likely to converge.  Are these really even coefficient that can even be used in an algorithm like I have attempted?

 

(Remember, I need to convert this SAS code to SQL, since I don't have SAS available in the production scoring environment.  So utilizing a different SAS procedure to 'score' the records isn't going to help me.)

Reeza
Super User

You'll need to convert something, but honestly I don't recall exactly what at this point. 

You're treating it as a linear regression model, but that's not what this is. 

 

You can see in the documentation how it's handled for a Tobit regression - they have the full code to score there and you'll need to do something similar for the lognormal. 

 

I don't think you're using the right estimates to model your data either, as you've indicated. You need to use the parameter estimates. InEst exists before you model your data, you want OUTEST instead, which will hold your output estimates.

https://documentation.sas.com/?docsetId=statug&docsetVersion=15.1&docsetTarget=statug_lifereg_syntax...

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 607 views
  • 0 likes
  • 2 in conversation