Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Survival in EM: How to obtain hazard for future t periods

Accepted Solution Solved
Reply
Contributor
Posts: 64
Accepted Solution

Survival in EM: How to obtain hazard for future t periods

Objective

Obtain a final dataset, with 1 row for each unique customer id, + 36 columns which represent subhazard1 probability in each of 36 future periods from now. + 36 columns for subhazard2 probability.

Using the Survival node, I'm able to obtain the probability at period +36 just fine, just not all of the 35 periods in between.

My setup

I have an un-expanded dataset, where each row is a unique ID. I have a start_date, and an end_date, and an indicator for y = 0 (still active) 1 (voluntary end) or 2 (involuntary end). if y = 0 the end date is blank. There are also 16 numeric explanatory variables with no missing values. I am running the data through Enterprise Miner like this:  [Data Source] --> [Data Partition] --> [Survival]


[Survival] Node Settings:

Data Format: Standard

Time Interval: Month

Left-Truncated: No

Left Training Time Range: 1/1/1995 - 5/31/2015

Sampling: No

Covariate x Time Interactions: Do Not Include

Survival Validation MethodSmiley Very Happyefault

Mean Residual Life: None

Default Forecast Intervals: No

Number of Forecast Intervals: 36


Training Data Example:

idstart_dateend_dateyx1x2x3x14x15x16
10001001May2010.0-0.993465152-0.066721505-0.5247613410.851112375-0.200641128-0.026184704
10002401Jul200918Oct201210.6407955980.5107989980.3134537080.851112375-0.415644559-0.026184704
10009001Oct2002.0-1.3136709241.1458268940.3134537080.851112375-0.531334657-0.026184704
10017001Jul2010.01.0154227020.765797739-1.4983338520.851112375-0.657759972-0.026184704
10018201Jun200312Mar201310.2238052-0.558561508-0.4076856-0.561356819-0.200641128-0.026184704
10021801Dec2002.0-0.2292475-0.5585615080.072210717-0.5613568190.153832731-0.026184704
10028601May2006.0-0.361361614-0.5585615080.313453708-0.5613568190.608860331-0.026184704
10030401Oct200812Aug20141-0.143914825-0.558561508-0.524761341-0.561356819-0.657759972-0.026184704
10031601Oct200820Apr20141-0.22924750.7657977390.3134537080.851112375-0.038251766-0.026184704
10034001Jun2010.0-0.143914825-1.669116518-0.524761341-0.561356819-0.200641128-0.026184704
10041801Jul200909Nov201210.4503103750.269509137-0.4076856-0.5613568190.006216554-0.026184704
10044021Oct201022Apr201420.640795598-0.5585615080.072210717-0.5613568190.608860331-0.026184704
10044401Sep2008.0-0.4546165770.765797739-0.4076856-0.5613568190.476907328-0.026184704
10045801Oct2000.0-0.7113196-0.0667215050.313453708-0.5613568190.153832731-0.026184704
10046001Dec2004.0-0.272496522-0.0667215050.0722107170.8511123750.153832731-0.026184704
10052001Mar2004.0-0.822476278-1.5962884310.313453708-0.561356819-0.200641128-0.026184704
10056401Feb2009.00.960178211-0.066721505-0.524761341-0.5613568190.608860331-0.026184704
10057801Apr2009.0-1.4942768150.2695091370.313453708-0.561356819-0.518154111-0.026184704
10060601Jul2001.0-0.2292475-1.5199985150.313453708-0.5613568190.608860331-0.026184704
10062601Aug200302Jul201210.3326382540.5107989980.0722107170.8511123750.153832731-0.026184704


Data for Scoring Example:

idstart_dateend_datex1x2x3x14x15x16
10001001May2010.-0.993465152-0.066721505-0.5247613410.851112375-0.200641128-0.026184704
10009001Oct2002.-1.3136709241.1458268940.3134537080.851112375-0.531334657-0.026184704
10017001Jul2010.1.0154227020.765797739-1.4983338520.851112375-0.657759972-0.026184704
10021801Dec2002.-0.2292475-0.5585615080.072210717-0.5613568190.153832731-0.026184704
10028601May2006.-0.361361614-0.5585615080.313453708-0.5613568190.608860331-0.026184704
10034001Jun2010.-0.143914825-1.669116518-0.524761341-0.561356819-0.200641128-0.026184704
10044401Sep2008.-0.4546165770.765797739-0.4076856-0.5613568190.476907328-0.026184704
10045801Oct2000.-0.7113196-0.0667215050.313453708-0.5613568190.153832731-0.026184704
10046001Dec2004.-0.272496522-0.0667215050.0722107170.8511123750.153832731-0.026184704
10052001Mar2004.-0.822476278-1.5962884310.313453708-0.561356819-0.200641128-0.026184704
10056401Feb2009.0.960178211-0.066721505-0.524761341-0.5613568190.608860331-0.026184704
10057801Apr2009.-1.4942768150.2695091370.313453708-0.561356819-0.518154111-0.026184704
10060601Jul2001.-0.2292475-1.5199985150.313453708-0.5613568190.608860331-0.026184704
10063801Mar2008.-0.361361614-0.558561508-0.524761341-0.5613568190.153832731-0.026184704
10066801Jan2008.0.2238052-0.558561508-0.524761341-0.561356819-0.200641128-0.026184704
10076401Jan2010.-0.4073662850.7657977390.3134537080.851112375-0.518154111-0.026184704
10088001Jan2002.0.277114766-0.0667215050.072210717-0.5613568190.153832731-0.026184704
10092801Dec1996.-0.503209025-0.0667215050.072210717-0.561356819-0.518154111-0.026184704
10101201Jan2005.0.075963483-0.0667215050.072210717-0.561356819-0.43304346-0.026184704
10102601Apr2010.-1.16126391.1458268940.3134537080.257502307-0.038251766-0.026184704
10119401May2010.0.83721290.269509137-1.3034891880.851112375-0.32305167-0.026184704
10121801Jun2010.-1.399701071-0.066721505-1.3034891880.8511123750.006216554-0.026184704
10128001Sep2005.-0.361361614-0.0667215050.072210717-0.561356819-0.657759972-0.026184704
10131201Jun2007.-0.553195345-0.964721010.0722107170.257502307-0.657759972-0.026184704
10137401May1999.1.015422702-0.5585615080.313453708-0.5613568190.153832731-0.026184704
10139401Apr2008.0.172619332-0.066721505-1.4983338520.8511123750.153832731-0.026184704
10140401Mar2006.-0.503209025-1.519998515-0.4076856-0.561356819-0.531334657-0.026184704
10141401Jul2004.0.075963483-0.964721010.313453708-0.561356819-0.331707518-0.026184704
10153401Apr2010.-1.4680781640.765797739-1.3034891880.851112375-0.5181541110.305042049
10156601Sep2010.-0.71131960.7657977390.313453708-0.5613568190.476907328-0.026184704


Scored Data Sample:

n_account_idSTART_DATEEM_SURVIVALEM_SURVFCSTEM_SURVEVENTEM_HAZARDEM_HZRDFCST_T_T_FCSTEM_SUBHZRD1EM_SUBHZRD2EM_SUBHZRD0EM_SUBHZRD1_SURVEM_SUBHZRD2_SURVEM_SUBHZRD0_SURV
10009001Oct20020.9720846620.4548124090.5321267510.0279153380.0114830390360.0123695780.0155457610.9720846620.0084076480.0030753910.988516961
10017001Jul20100.9804280830.5950186960.3931031690.0195719170.0068056360360.0061423430.0134295750.9804280830.0041590310.0026466050.993194364
10021801Dec20020.9755025360.4919309570.4957153470.0244974640.0108246470360.0124704120.0120270520.9755025360.0084521130.0023725340.989175353
10028601May20060.9931526940.8081829110.186245060.0068473060.0036590730360.0048859710.0019613350.9931526940.0032762910.0003827820.996340927
10044401Sep20080.9786154590.5237475510.4648076050.0213845410.0105629980360.0132681770.0081163640.9786154590.0089665820.0015964160.989437002
10045801Oct20000.9875135840.6794034460.3120059740.0124864160.0065690760360.008654030.0038323870.9875135840.0058190570.0007500190.993430924
10046001Dec20040.9856599940.6336275740.3571540110.0143400060.0080211310360.010932090.0034079160.9856599940.0073539060.0006672250.991978869
10052001Mar20040.9894668580.7161938920.2761820310.0105331420.005856510360.0079766270.0025565150.9894668580.0053568150.0004996950.99414349
10057801Apr20090.9875938510.6887286130.3026195820.0124061490.0060863990360.0076692910.0047368570.9875938510.0051589950.0009274040.993913601
10060601Jul20010.9930155450.8022770330.1920800880.0069844550.0038534590360.005239030.0017454250.9930155450.0035128340.0003406250.996146541
10076401Jan20100.9828774990.6135617010.3757495710.0171225010.0073288540360.0082840890.0088384120.9828774990.0055922990.0017365550.992671146
10088001Jan20020.9894360320.7318326840.2603537170.0105639680.0049801090360.0061136590.0044503080.9894360320.0041094610.0008706480.995019891
10092801Dec19960.9890873180.7173951860.2746897330.0109126820.0055235620360.0071131610.0037995210.9890873180.0047803770.0007431850.994476438
10101201Jan20050.9886574260.7067231520.2851688230.0113425740.005814330360.0075452210.0037973530.9886574260.0050714630.0007428670.99418567
10102601Apr20100.9669752950.3992363550.5871286920.0330247050.0129641840360.0132444480.0197802570.9669752950.0090363070.0039278770.987035816
10119401May20100.9785163730.4976536150.4914202470.0214836270.0125075620360.0173350010.0041486260.9785163730.0116930860.0008144760.987492438
10121801Jun20100.9950856460.8487193440.147089150.0049143540.0030732920360.0044560770.0004582760.9950856460.0029839758.93179E-050.996926708
10128001Sep20050.9910285430.7700561970.2229727360.0089714570.004083170360.0048913160.0040801410.9910285430.0032855050.0007976650.99591683
10131201Jun20070.9834695110.6063494060.3834588680.0165304890.0082149770360.0104043580.0061261320.9834695110.0070131240.0012018530.991785023
10137401May19990.991927270.7775874640.2160841950.008072730.0043284350360.0057858030.0022869270.991927270.0038818570.0004465780.995671565



Accepted Solutions
Solution
‎06-28-2015 10:38 PM
Super Contributor
Posts: 336

Re: Survival in EM: How to obtain hazard for future t periods

HI JBerry,

Thanks for the detailed question!!! Nice screenshots and awesome profile pic BTW!

I borrowed this idea from Wendy Czika. Do this:

1. Add a Score node after your Survivial node and run it. Open the results and copy the Optimized code.

2. Connect a SAS Code node after your Data set. Open the editor and add this:

-libname statement to create a library (remember that valid library names are 8 characters or less)

-data statement to output your results

-set statement for your data set

-the optimized score code you copied from the Score node

-run statement

It would look something like this:

libname results "D:\EM\EM_Projects\EM13.2\miguel";

data results.scored36m;

set &EM_IMPORT_DATA;

/* your optimized core code goes here */

run;

4. Scroll all the way down to the last part of the optimized score code you pasted. Right before the end you will see the part where the macro EM_SURVEVENT calcualtes the survival probability. Add the code highlighted in yellow. It creates the variable IntervalsInFuture and outputs the calculations for all periods after _t0_.

/***** omitted lines of code ******/

  if _T_=t0_fcst then EM_SURVEVENT=(EM_SURVIVAL-EM_SURVFCST)/0.00001;

end;

/*just to be able to easier see how many months after the censor date we are looking at */

IntervalsInFuture = _t_ - _t0_;

/* output each period */

if _t_ >= _t0_ then output;

_t_+1;

end;

_T_ = _T0_;

;

end;

5. Open your results data set to confirm this worked as intended. I just tried this with my go-to example and it worked great.

I hope this helps!

Thanks,

https://www.linkedin.com/in/migmaldonado

View solution in original post


All Replies
Solution
‎06-28-2015 10:38 PM
Super Contributor
Posts: 336

Re: Survival in EM: How to obtain hazard for future t periods

HI JBerry,

Thanks for the detailed question!!! Nice screenshots and awesome profile pic BTW!

I borrowed this idea from Wendy Czika. Do this:

1. Add a Score node after your Survivial node and run it. Open the results and copy the Optimized code.

2. Connect a SAS Code node after your Data set. Open the editor and add this:

-libname statement to create a library (remember that valid library names are 8 characters or less)

-data statement to output your results

-set statement for your data set

-the optimized score code you copied from the Score node

-run statement

It would look something like this:

libname results "D:\EM\EM_Projects\EM13.2\miguel";

data results.scored36m;

set &EM_IMPORT_DATA;

/* your optimized core code goes here */

run;

4. Scroll all the way down to the last part of the optimized score code you pasted. Right before the end you will see the part where the macro EM_SURVEVENT calcualtes the survival probability. Add the code highlighted in yellow. It creates the variable IntervalsInFuture and outputs the calculations for all periods after _t0_.

/***** omitted lines of code ******/

  if _T_=t0_fcst then EM_SURVEVENT=(EM_SURVIVAL-EM_SURVFCST)/0.00001;

end;

/*just to be able to easier see how many months after the censor date we are looking at */

IntervalsInFuture = _t_ - _t0_;

/* output each period */

if _t_ >= _t0_ then output;

_t_+1;

end;

_T_ = _T0_;

;

end;

5. Open your results data set to confirm this worked as intended. I just tried this with my go-to example and it worked great.

I hope this helps!

Thanks,

https://www.linkedin.com/in/migmaldonado

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 481 views
  • 0 likes
  • 2 in conversation