turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Enterprise Miner Surivival Modelling Help - Time S...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-25-2017 10:15 AM - edited 07-25-2017 10:15 AM

Hi,

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

Within my data _t_ = 0 is the initial obs month and no accounts here have hit the event described above. As a result of this, I've noticed that there are a couple of issues with the way that cubic splines are being calculated, mainly as the hazard function has a sharp spike at _t_ = 1, as this is actually where my highest event rate occurs.

Upon noticing that the sharp spike is being used as part of the spline fitting, I realise this is most likely not what I want to be happening. I am correct to include _t_ = 0 with no events in my dataset when using the surivival node?

Please let me know if some more information is needed.

Thanks!

Nathan

Accepted Solutions

Solution

07-27-2017
09:32 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nathanb_1993

07-25-2017 12:42 PM

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

And this one for the expanded format:

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nathanb_1993

07-25-2017 11:02 AM

nathanb_1993 wrote:

Hi,

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

Without fully seeing what you're doing, my initial concern is your data structure is not correct for survival modeling. Review the code generate, if it's a PROC PHREG then your data structure will likely need to be modifed. You can review the examples in PROC PHREG to see how to set up your data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

07-25-2017 11:35 AM

Hi Reeza,

Below is an imagine which shows 3 (simple) examples of observations in my dataset. Hopefully this will help in understanding my data setup, obviously my actual dataset is considerably larger than this! But the image shows the two outcomes in my datset, either 'EVENT' = 1 if the event eventually happened in which case the END_DATE is populated, or event = 0 at the end of the ID, so END_DATE is not populated. Each ID is independent.

Don't worry about the extra Variables, just in there as an example.

As mentioned previously, there are no instances where we have start_date = end_date since _t_ = 0 has no events as this is an initial observation month.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nathanb_1993

07-25-2017 11:37 AM

Yeah, pretty sure that's not the set up required for survival analysis, at least not in Base SAS. You need a single record per ID from what I understand. This assumes it's use PROC PHREG behind the scenes. If it's not, then you may be ok, but I strongly suspect this is not the correct data structure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

07-25-2017 11:43 AM

So for Enterprise Miner we believe it's required we set the data up in the this structure, learnt from SAS documentation and guidance available, we need multiple records per ID for the change of the _t_ variable which is time since observation.

No worries though, I'll keep this in mind and see if anything is happening behind the scenes. Thanks for the input.

Solution

07-27-2017
09:32 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to nathanb_1993

07-25-2017 12:42 PM

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

And this one for the expanded format:

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to WendyCzika

07-25-2017 12:50 PM

Hi Wendy,

Thanks for this, I'll have a watch of the videos when I'm back in the office tomrorow morning.

As a little extra context (and you may be able to highlight what I've done incorrect with _t_), here is my current hazard function which I have output, my problem is with that initial spike at 1. When the node does the regression, it's treating 0 as a very favourable month (since no events occur here as it's my observation month)

Thank you

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to WendyCzika

07-26-2017 04:50 AM

Hi,

Very helpful video! I actually think we used it to expand our data which I've had a look over and it all seems to be OK.

Just still having the issue of the _t_ = 0 month which is being included in the regression as a favourable month as there are no events, the post prior to this may help a little in explaining my confusion.

Is there such a way that this can be avoided, i.e. avoid my hazard starting at 0?

I realise this is probably quite confusing, but hopefully you can see where I'm coming from?

Thanks

Very helpful video! I actually think we used it to expand our data which I've had a look over and it all seems to be OK.

Just still having the issue of the _t_ = 0 month which is being included in the regression as a favourable month as there are no events, the post prior to this may help a little in explaining my confusion.

Is there such a way that this can be avoided, i.e. avoid my hazard starting at 0?

I realise this is probably quite confusing, but hopefully you can see where I'm coming from?

Thanks