Objective
Obtain a final dataset, with 1 row for each unique customer id, + 36 columns which represent subhazard1 probability in each of 36 future periods from now. + 36 columns for subhazard2 probability.
Using the Survival node, I'm able to obtain the probability at period +36 just fine, just not all of the 35 periods in between.
My setup
I have an un-expanded dataset, where each row is a unique ID. I have a start_date, and an end_date, and an indicator for y = 0 (still active) 1 (voluntary end) or 2 (involuntary end). if y = 0 the end date is blank. There are also 16 numeric explanatory variables with no missing values. I am running the data through Enterprise Miner like this: [Data Source] --> [Data Partition] --> [Survival]
[Survival] Node Settings:
Data Format: Standard
Time Interval: Month
Left-Truncated: No
Left Training Time Range: 1/1/1995 - 5/31/2015
Sampling: No
Covariate x Time Interactions: Do Not Include
Survival Validation Method:Default
Mean Residual Life: None
Default Forecast Intervals: No
Number of Forecast Intervals: 36
Training Data Example:
| id | start_date | end_date | y | x1 | x2 | x3 | x14 | x15 | x16 |
|---|---|---|---|---|---|---|---|---|---|
| 100010 | 01May2010 | . | 0 | -0.993465152 | -0.066721505 | -0.524761341 | 0.851112375 | -0.200641128 | -0.026184704 |
| 100024 | 01Jul2009 | 18Oct2012 | 1 | 0.640795598 | 0.510798998 | 0.313453708 | 0.851112375 | -0.415644559 | -0.026184704 |
| 100090 | 01Oct2002 | . | 0 | -1.313670924 | 1.145826894 | 0.313453708 | 0.851112375 | -0.531334657 | -0.026184704 |
| 100170 | 01Jul2010 | . | 0 | 1.015422702 | 0.765797739 | -1.498333852 | 0.851112375 | -0.657759972 | -0.026184704 |
| 100182 | 01Jun2003 | 12Mar2013 | 1 | 0.2238052 | -0.558561508 | -0.4076856 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100218 | 01Dec2002 | . | 0 | -0.2292475 | -0.558561508 | 0.072210717 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100286 | 01May2006 | . | 0 | -0.361361614 | -0.558561508 | 0.313453708 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100304 | 01Oct2008 | 12Aug2014 | 1 | -0.143914825 | -0.558561508 | -0.524761341 | -0.561356819 | -0.657759972 | -0.026184704 |
| 100316 | 01Oct2008 | 20Apr2014 | 1 | -0.2292475 | 0.765797739 | 0.313453708 | 0.851112375 | -0.038251766 | -0.026184704 |
| 100340 | 01Jun2010 | . | 0 | -0.143914825 | -1.669116518 | -0.524761341 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100418 | 01Jul2009 | 09Nov2012 | 1 | 0.450310375 | 0.269509137 | -0.4076856 | -0.561356819 | 0.006216554 | -0.026184704 |
| 100440 | 21Oct2010 | 22Apr2014 | 2 | 0.640795598 | -0.558561508 | 0.072210717 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100444 | 01Sep2008 | . | 0 | -0.454616577 | 0.765797739 | -0.4076856 | -0.561356819 | 0.476907328 | -0.026184704 |
| 100458 | 01Oct2000 | . | 0 | -0.7113196 | -0.066721505 | 0.313453708 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100460 | 01Dec2004 | . | 0 | -0.272496522 | -0.066721505 | 0.072210717 | 0.851112375 | 0.153832731 | -0.026184704 |
| 100520 | 01Mar2004 | . | 0 | -0.822476278 | -1.596288431 | 0.313453708 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100564 | 01Feb2009 | . | 0 | 0.960178211 | -0.066721505 | -0.524761341 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100578 | 01Apr2009 | . | 0 | -1.494276815 | 0.269509137 | 0.313453708 | -0.561356819 | -0.518154111 | -0.026184704 |
| 100606 | 01Jul2001 | . | 0 | -0.2292475 | -1.519998515 | 0.313453708 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100626 | 01Aug2003 | 02Jul2012 | 1 | 0.332638254 | 0.510798998 | 0.072210717 | 0.851112375 | 0.153832731 | -0.026184704 |
Data for Scoring Example:
| id | start_date | end_date | x1 | x2 | x3 | x14 | x15 | x16 |
|---|---|---|---|---|---|---|---|---|
| 100010 | 01May2010 | . | -0.993465152 | -0.066721505 | -0.524761341 | 0.851112375 | -0.200641128 | -0.026184704 |
| 100090 | 01Oct2002 | . | -1.313670924 | 1.145826894 | 0.313453708 | 0.851112375 | -0.531334657 | -0.026184704 |
| 100170 | 01Jul2010 | . | 1.015422702 | 0.765797739 | -1.498333852 | 0.851112375 | -0.657759972 | -0.026184704 |
| 100218 | 01Dec2002 | . | -0.2292475 | -0.558561508 | 0.072210717 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100286 | 01May2006 | . | -0.361361614 | -0.558561508 | 0.313453708 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100340 | 01Jun2010 | . | -0.143914825 | -1.669116518 | -0.524761341 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100444 | 01Sep2008 | . | -0.454616577 | 0.765797739 | -0.4076856 | -0.561356819 | 0.476907328 | -0.026184704 |
| 100458 | 01Oct2000 | . | -0.7113196 | -0.066721505 | 0.313453708 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100460 | 01Dec2004 | . | -0.272496522 | -0.066721505 | 0.072210717 | 0.851112375 | 0.153832731 | -0.026184704 |
| 100520 | 01Mar2004 | . | -0.822476278 | -1.596288431 | 0.313453708 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100564 | 01Feb2009 | . | 0.960178211 | -0.066721505 | -0.524761341 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100578 | 01Apr2009 | . | -1.494276815 | 0.269509137 | 0.313453708 | -0.561356819 | -0.518154111 | -0.026184704 |
| 100606 | 01Jul2001 | . | -0.2292475 | -1.519998515 | 0.313453708 | -0.561356819 | 0.608860331 | -0.026184704 |
| 100638 | 01Mar2008 | . | -0.361361614 | -0.558561508 | -0.524761341 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100668 | 01Jan2008 | . | 0.2238052 | -0.558561508 | -0.524761341 | -0.561356819 | -0.200641128 | -0.026184704 |
| 100764 | 01Jan2010 | . | -0.407366285 | 0.765797739 | 0.313453708 | 0.851112375 | -0.518154111 | -0.026184704 |
| 100880 | 01Jan2002 | . | 0.277114766 | -0.066721505 | 0.072210717 | -0.561356819 | 0.153832731 | -0.026184704 |
| 100928 | 01Dec1996 | . | -0.503209025 | -0.066721505 | 0.072210717 | -0.561356819 | -0.518154111 | -0.026184704 |
| 101012 | 01Jan2005 | . | 0.075963483 | -0.066721505 | 0.072210717 | -0.561356819 | -0.43304346 | -0.026184704 |
| 101026 | 01Apr2010 | . | -1.1612639 | 1.145826894 | 0.313453708 | 0.257502307 | -0.038251766 | -0.026184704 |
| 101194 | 01May2010 | . | 0.8372129 | 0.269509137 | -1.303489188 | 0.851112375 | -0.32305167 | -0.026184704 |
| 101218 | 01Jun2010 | . | -1.399701071 | -0.066721505 | -1.303489188 | 0.851112375 | 0.006216554 | -0.026184704 |
| 101280 | 01Sep2005 | . | -0.361361614 | -0.066721505 | 0.072210717 | -0.561356819 | -0.657759972 | -0.026184704 |
| 101312 | 01Jun2007 | . | -0.553195345 | -0.96472101 | 0.072210717 | 0.257502307 | -0.657759972 | -0.026184704 |
| 101374 | 01May1999 | . | 1.015422702 | -0.558561508 | 0.313453708 | -0.561356819 | 0.153832731 | -0.026184704 |
| 101394 | 01Apr2008 | . | 0.172619332 | -0.066721505 | -1.498333852 | 0.851112375 | 0.153832731 | -0.026184704 |
| 101404 | 01Mar2006 | . | -0.503209025 | -1.519998515 | -0.4076856 | -0.561356819 | -0.531334657 | -0.026184704 |
| 101414 | 01Jul2004 | . | 0.075963483 | -0.96472101 | 0.313453708 | -0.561356819 | -0.331707518 | -0.026184704 |
| 101534 | 01Apr2010 | . | -1.468078164 | 0.765797739 | -1.303489188 | 0.851112375 | -0.518154111 | 0.305042049 |
| 101566 | 01Sep2010 | . | -0.7113196 | 0.765797739 | 0.313453708 | -0.561356819 | 0.476907328 | -0.026184704 |
Scored Data Sample:
| n_account_id | START_DATE | EM_SURVIVAL | EM_SURVFCST | EM_SURVEVENT | EM_HAZARD | EM_HZRDFCST | _T_ | T_FCST | EM_SUBHZRD1 | EM_SUBHZRD2 | EM_SUBHZRD0 | EM_SUBHZRD1_SURV | EM_SUBHZRD2_SURV | EM_SUBHZRD0_SURV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100090 | 01Oct2002 | 0.972084662 | 0.454812409 | 0.532126751 | 0.027915338 | 0.011483039 | 0 | 36 | 0.012369578 | 0.015545761 | 0.972084662 | 0.008407648 | 0.003075391 | 0.988516961 |
| 100170 | 01Jul2010 | 0.980428083 | 0.595018696 | 0.393103169 | 0.019571917 | 0.006805636 | 0 | 36 | 0.006142343 | 0.013429575 | 0.980428083 | 0.004159031 | 0.002646605 | 0.993194364 |
| 100218 | 01Dec2002 | 0.975502536 | 0.491930957 | 0.495715347 | 0.024497464 | 0.010824647 | 0 | 36 | 0.012470412 | 0.012027052 | 0.975502536 | 0.008452113 | 0.002372534 | 0.989175353 |
| 100286 | 01May2006 | 0.993152694 | 0.808182911 | 0.18624506 | 0.006847306 | 0.003659073 | 0 | 36 | 0.004885971 | 0.001961335 | 0.993152694 | 0.003276291 | 0.000382782 | 0.996340927 |
| 100444 | 01Sep2008 | 0.978615459 | 0.523747551 | 0.464807605 | 0.021384541 | 0.010562998 | 0 | 36 | 0.013268177 | 0.008116364 | 0.978615459 | 0.008966582 | 0.001596416 | 0.989437002 |
| 100458 | 01Oct2000 | 0.987513584 | 0.679403446 | 0.312005974 | 0.012486416 | 0.006569076 | 0 | 36 | 0.00865403 | 0.003832387 | 0.987513584 | 0.005819057 | 0.000750019 | 0.993430924 |
| 100460 | 01Dec2004 | 0.985659994 | 0.633627574 | 0.357154011 | 0.014340006 | 0.008021131 | 0 | 36 | 0.01093209 | 0.003407916 | 0.985659994 | 0.007353906 | 0.000667225 | 0.991978869 |
| 100520 | 01Mar2004 | 0.989466858 | 0.716193892 | 0.276182031 | 0.010533142 | 0.00585651 | 0 | 36 | 0.007976627 | 0.002556515 | 0.989466858 | 0.005356815 | 0.000499695 | 0.99414349 |
| 100578 | 01Apr2009 | 0.987593851 | 0.688728613 | 0.302619582 | 0.012406149 | 0.006086399 | 0 | 36 | 0.007669291 | 0.004736857 | 0.987593851 | 0.005158995 | 0.000927404 | 0.993913601 |
| 100606 | 01Jul2001 | 0.993015545 | 0.802277033 | 0.192080088 | 0.006984455 | 0.003853459 | 0 | 36 | 0.00523903 | 0.001745425 | 0.993015545 | 0.003512834 | 0.000340625 | 0.996146541 |
| 100764 | 01Jan2010 | 0.982877499 | 0.613561701 | 0.375749571 | 0.017122501 | 0.007328854 | 0 | 36 | 0.008284089 | 0.008838412 | 0.982877499 | 0.005592299 | 0.001736555 | 0.992671146 |
| 100880 | 01Jan2002 | 0.989436032 | 0.731832684 | 0.260353717 | 0.010563968 | 0.004980109 | 0 | 36 | 0.006113659 | 0.004450308 | 0.989436032 | 0.004109461 | 0.000870648 | 0.995019891 |
| 100928 | 01Dec1996 | 0.989087318 | 0.717395186 | 0.274689733 | 0.010912682 | 0.005523562 | 0 | 36 | 0.007113161 | 0.003799521 | 0.989087318 | 0.004780377 | 0.000743185 | 0.994476438 |
| 101012 | 01Jan2005 | 0.988657426 | 0.706723152 | 0.285168823 | 0.011342574 | 0.00581433 | 0 | 36 | 0.007545221 | 0.003797353 | 0.988657426 | 0.005071463 | 0.000742867 | 0.99418567 |
| 101026 | 01Apr2010 | 0.966975295 | 0.399236355 | 0.587128692 | 0.033024705 | 0.012964184 | 0 | 36 | 0.013244448 | 0.019780257 | 0.966975295 | 0.009036307 | 0.003927877 | 0.987035816 |
| 101194 | 01May2010 | 0.978516373 | 0.497653615 | 0.491420247 | 0.021483627 | 0.012507562 | 0 | 36 | 0.017335001 | 0.004148626 | 0.978516373 | 0.011693086 | 0.000814476 | 0.987492438 |
| 101218 | 01Jun2010 | 0.995085646 | 0.848719344 | 0.14708915 | 0.004914354 | 0.003073292 | 0 | 36 | 0.004456077 | 0.000458276 | 0.995085646 | 0.002983975 | 8.93179E-05 | 0.996926708 |
| 101280 | 01Sep2005 | 0.991028543 | 0.770056197 | 0.222972736 | 0.008971457 | 0.00408317 | 0 | 36 | 0.004891316 | 0.004080141 | 0.991028543 | 0.003285505 | 0.000797665 | 0.99591683 |
| 101312 | 01Jun2007 | 0.983469511 | 0.606349406 | 0.383458868 | 0.016530489 | 0.008214977 | 0 | 36 | 0.010404358 | 0.006126132 | 0.983469511 | 0.007013124 | 0.001201853 | 0.991785023 |
| 101374 | 01May1999 | 0.99192727 | 0.777587464 | 0.216084195 | 0.00807273 | 0.004328435 | 0 | 36 | 0.005785803 | 0.002286927 | 0.99192727 | 0.003881857 | 0.000446578 | 0.995671565 |
HI JBerry,
Thanks for the detailed question!!! Nice screenshots and awesome profile pic BTW!
I borrowed this idea from Wendy Czika. Do this:
1. Add a Score node after your Survivial node and run it. Open the results and copy the Optimized code.
2. Connect a SAS Code node after your Data set. Open the editor and add this:
-libname statement to create a library (remember that valid library names are 8 characters or less)
-data statement to output your results
-set statement for your data set
-the optimized score code you copied from the Score node
-run statement
It would look something like this:
libname results "D:\EM\EM_Projects\EM13.2\miguel";
data results.scored36m;
set &EM_IMPORT_DATA;
/* your optimized core code goes here */
run;
4. Scroll all the way down to the last part of the optimized score code you pasted. Right before the end you will see the part where the macro EM_SURVEVENT calcualtes the survival probability. Add the code highlighted in yellow. It creates the variable IntervalsInFuture and outputs the calculations for all periods after _t0_.
/***** omitted lines of code ******/
if _T_=t0_fcst then EM_SURVEVENT=(EM_SURVIVAL-EM_SURVFCST)/0.00001;
end;
/*just to be able to easier see how many months after the censor date we are looking at */
IntervalsInFuture = _t_ - _t0_;
/* output each period */
if _t_ >= _t0_ then output;
_t_+1;
end;
_T_ = _T0_;
;
end;
5. Open your results data set to confirm this worked as intended. I just tried this with my go-to example and it worked great.
I hope this helps!
Thanks,
HI JBerry,
Thanks for the detailed question!!! Nice screenshots and awesome profile pic BTW!
I borrowed this idea from Wendy Czika. Do this:
1. Add a Score node after your Survivial node and run it. Open the results and copy the Optimized code.
2. Connect a SAS Code node after your Data set. Open the editor and add this:
-libname statement to create a library (remember that valid library names are 8 characters or less)
-data statement to output your results
-set statement for your data set
-the optimized score code you copied from the Score node
-run statement
It would look something like this:
libname results "D:\EM\EM_Projects\EM13.2\miguel";
data results.scored36m;
set &EM_IMPORT_DATA;
/* your optimized core code goes here */
run;
4. Scroll all the way down to the last part of the optimized score code you pasted. Right before the end you will see the part where the macro EM_SURVEVENT calcualtes the survival probability. Add the code highlighted in yellow. It creates the variable IntervalsInFuture and outputs the calculations for all periods after _t0_.
/***** omitted lines of code ******/
if _T_=t0_fcst then EM_SURVEVENT=(EM_SURVIVAL-EM_SURVFCST)/0.00001;
end;
/*just to be able to easier see how many months after the censor date we are looking at */
IntervalsInFuture = _t_ - _t0_;
/* output each period */
if _t_ >= _t0_ then output;
_t_+1;
end;
_T_ = _T0_;
;
end;
5. Open your results data set to confirm this worked as intended. I just tried this with my go-to example and it worked great.
I hope this helps!
Thanks,
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.