Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Survival in EM - Recreate Curves on Scored Data?

Reply
Contributor
Posts: 64

Survival in EM - Recreate Curves on Scored Data?

I have successfully built a survival model in EM which has a K-S of 30 in both Train, Valid, and Test. So I assume the model is pretty good.

Here are some facts about it:

- built it on a sample of 500K obs

- unexpanded data

- no time vary-ing covariates

- Forecasting 36 month-intervals

- Customer base ranges from tenure of 0 months to 250 months

- No truncation

The curves drawn by the Node Results are very nice. I can see hazard spikes during months that make sense (at 3,12,24,48 months). The survival curve also look nice, it descends as I'd expect.

However, now that I've scored the data, I'd like to replicate these curves by querying the results. However when I try - the curves look vastly different. When I graph _t_ (tenure of customer) vs Avg(EM_SURVEVENT) - my curve looks weird, in fact it even increases along the way!

Is there something wrong with the way I am trying to recreate these charts?

I tried graphing instant risk and subhazard functions against _T_ and it also did not match the model graphs, so I'm afraid there is something wrong.

Source of Model Curve:  SAS Survival Node => Results => click chart => Tables button at top of screen

Source of Scored Curve:

SELECT _T_ AS RELATIVE_TENURE, AVG(1-((EM_SURVIVAL-EM_SURVFCST)/EM_SURVIVAL)) AS S

FROM [Scored Results]

GROUP BY _T_;

Super Contributor
Posts: 336

Re: Survival in EM - Recreate Curves on Scored Data?

Hey JBerry,

Not sure I get the second part, specially instant risk. But I am no expert in survival analysis. I use this node a lot, mostly to get hazard functions, but I am still very low on the learning curb.

It sounds to me like you were trying to get the survival function?

survival function.png

If I was to redo the survival function in EM, I would add something like the below in the SAS Code node. Notice that your survival node creates a dataset _ehcendata which summarizes events, event dates, and _y_.

You can use that to get the curves you were looking for.

Add the below in a SAS code node and connect it to your Survival node.

   data ehcendata;

   set &EM_LIB..&EM_METASOURCE_NODEID._ehcendata;

   run;

   proc lifetest data=ehcendata method=LT;

   time _y_*event(0);

   run;

If you did it in base SAS you would also get the plots (change em_lib for your workspace and em_metasource_nodeid for your survival node ID).

SurvivalPlot1.png

Ask a Question
Discussion stats
  • 1 reply
  • 469 views
  • 0 likes
  • 2 in conversation