Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

[ Edited ]

Hi,

 

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

 

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

 

Within my data _t_ = 0 is the initial obs month and no accounts here have hit the event described above. As a result of this, I've noticed that there are a couple of issues with the way that cubic splines are being calculated, mainly as the hazard function has a sharp spike at _t_ = 1, as this is actually where my highest event rate occurs.

 

Upon noticing that the sharp spike is being used as part of the spline fitting, I realise this is most likely not what I want to be happening. I am correct to include _t_ = 0 with no events in my dataset when using the surivival node?

 

 

Please let me know if some more information is needed.

 

Thanks!

Nathan


Accepted Solutions
Solution
3 weeks ago
SAS Super FREQ
Posts: 271

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

 

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

http://www.sas.com/apps/webnet/video-sharing.html?player=brightcove&width=640&height=360&autoStart=t...

 

And this one for the expanded format: 

http://www.sas.com/apps/webnet/video-sharing.html?player=brightcove&width=640&height=360&autoStart=t...

View solution in original post


All Replies
Super User
Posts: 17,780

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion


nathanb_1993 wrote:

Hi,

 

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

 

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

 

 

Without fully seeing what you're doing, my initial concern is your data structure is not correct for survival modeling. Review the code generate, if it's a PROC PHREG then your data structure will likely need to be modifed. You can review the examples in PROC PHREG to see how to set up your data. 

 

Occasional Contributor
Posts: 5

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

Hi Reeza,

 

Below is an imagine which shows 3 (simple) examples of observations in my dataset. Hopefully this will help in understanding my data setup, obviously my actual dataset is considerably larger than this! But the image shows the two outcomes in my datset, either 'EVENT' = 1 if the event eventually happened in which case the END_DATE is populated, or event = 0 at the end of the ID, so END_DATE is not populated. Each ID is independent.

Data Set Up.PNG

Don't worry about the extra Variables, just in there as an example.

 

As mentioned previously, there are no instances where we have start_date = end_date since _t_ = 0 has no events as this is an initial observation month.


Data Set Up.PNG
Super User
Posts: 17,780

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

Yeah, pretty sure that's not the set up required for survival analysis, at least not in Base SAS. You need a single record per ID from what I understand.  This assumes it's use PROC PHREG behind the scenes. If it's not, then you may be ok, but I strongly suspect this is not the correct data structure. 

Occasional Contributor
Posts: 5

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

So for Enterprise Miner we believe it's required we set the data up in the this structure, learnt from SAS documentation and guidance available, we need multiple records per ID for the change of the _t_ variable which is time since observation.

 

No worries though, I'll keep this in mind and see if anything is happening behind the scenes. Thanks for the input.

Solution
3 weeks ago
SAS Super FREQ
Posts: 271

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

 

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

http://www.sas.com/apps/webnet/video-sharing.html?player=brightcove&width=640&height=360&autoStart=t...

 

And this one for the expanded format: 

http://www.sas.com/apps/webnet/video-sharing.html?player=brightcove&width=640&height=360&autoStart=t...

Occasional Contributor
Posts: 5

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

Hi Wendy,

 

Thanks for this, I'll have a watch of the videos when I'm back in the office tomrorow morning.

 

As a little extra context (and you may be able to highlight what I've done incorrect with _t_), here is my current hazard function which I have output, my problem is with that initial spike at 1. When the node does the regression, it's treating 0 as a very favourable month (since no events occur here as it's my observation month)

 

Hazard.PNG

 

Thank you

Occasional Contributor
Posts: 5

Re: Enterprise Miner Surivival Modelling Help - Time Since Observation Confusion

Hi,

Very helpful video! I actually think we used it to expand our data which I've had a look over and it all seems to be OK.

Just still having the issue of the _t_ = 0 month which is being included in the regression as a favourable month as there are no events, the post prior to this may help a little in explaining my confusion.

Is there such a way that this can be avoided, i.e. avoid my hazard starting at 0?

I realise this is probably quite confusing, but hopefully you can see where I'm coming from?

Thanks
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 593 views
  • 2 likes
  • 3 in conversation