Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- SAS Data Science
- /
- Enterprise Miner Surivival Modelling Help - Time Since Observation Con...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-25-2017 10:15 AM
(1543 views)

Hi,

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

Within my data _t_ = 0 is the initial obs month and no accounts here have hit the event described above. As a result of this, I've noticed that there are a couple of issues with the way that cubic splines are being calculated, mainly as the hazard function has a sharp spike at _t_ = 1, as this is actually where my highest event rate occurs.

Upon noticing that the sharp spike is being used as part of the spline fitting, I realise this is most likely not what I want to be happening. I am correct to include _t_ = 0 with no events in my dataset when using the surivival node?

Please let me know if some more information is needed.

Thanks!

Nathan

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

And this one for the expanded format:

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@nathanb_1993 wrote:

Hi,

I was hoping someone may be able to help me with some confusion around the survival node in Enterprise Miner, paticularly with the way that my data is structured.

Some quick background, I'm attempting to model the time to defaulting on a mortgage, a simple 1 or 0 binary flag. For every observation I have the _T_ variable which is my time since observation and starts from 0 and goes up to all the observations for following months, with an end date recorded if the event mentioned above happens.

Without fully seeing what you're doing, my initial concern is your data structure is not correct for survival modeling. Review the code generate, if it's a PROC PHREG then your data structure will likely need to be modifed. You can review the examples in PROC PHREG to see how to set up your data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Reeza,

Below is an imagine which shows 3 (simple) examples of observations in my dataset. Hopefully this will help in understanding my data setup, obviously my actual dataset is considerably larger than this! But the image shows the two outcomes in my datset, either 'EVENT' = 1 if the event eventually happened in which case the END_DATE is populated, or event = 0 at the end of the ID, so END_DATE is not populated. Each ID is independent.

Don't worry about the extra Variables, just in there as an example.

As mentioned previously, there are no instances where we have start_date = end_date since _t_ = 0 has no events as this is an initial observation month.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So for Enterprise Miner we believe it's required we set the data up in the this structure, learnt from SAS documentation and guidance available, we need multiple records per ID for the change of the _t_ variable which is time since observation.

No worries though, I'll keep this in mind and see if anything is happening behind the scenes. Thanks for the input.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have time-varying covariates, then yes you need to use an expanded form of the data, but if not, you can have just 1 obs. per ID.

There are a couple of videos available that can help you with formatting your data, this one for the standard format without time-varying covariates:

And this one for the expanded format:

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Wendy,

Thanks for this, I'll have a watch of the videos when I'm back in the office tomrorow morning.

As a little extra context (and you may be able to highlight what I've done incorrect with _t_), here is my current hazard function which I have output, my problem is with that initial spike at 1. When the node does the regression, it's treating 0 as a very favourable month (since no events occur here as it's my observation month)

Thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Very helpful video! I actually think we used it to expand our data which I've had a look over and it all seems to be OK.

Just still having the issue of the _t_ = 0 month which is being included in the regression as a favourable month as there are no events, the post prior to this may help a little in explaining my confusion.

Is there such a way that this can be avoided, i.e. avoid my hazard starting at 0?

I realise this is probably quite confusing, but hopefully you can see where I'm coming from?

Thanks

Very helpful video! I actually think we used it to expand our data which I've had a look over and it all seems to be OK.

Just still having the issue of the _t_ = 0 month which is being included in the regression as a favourable month as there are no events, the post prior to this may help a little in explaining my confusion.

Is there such a way that this can be avoided, i.e. avoid my hazard starting at 0?

I realise this is probably quite confusing, but hopefully you can see where I'm coming from?

Thanks

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.