BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
adjgiulio
Obsidian | Level 7

I've a dataset where each observation has a common set of variables. Each observation also has a time series set of variables, with the length of the series changing from observation to observation.

For instance. The max length of the time series is 36. A member who churned after 6 months would have 6 out of 36 time series data points (the remaining 30 would be missing values). Another member who churned after 22 months would have 22 datapoints out of 36.

Something like this:

obs     age     gender     t1     t2     t3     t4     ...     t36

1          23          0        9       8      3       .               .

2          54          1         8      8      .        .               .

3          34          1          5     5     6       4               8

I want to create an ensemble model where a model is fitted to each subgroup of members according to the length of their time series. In order to do so, I need to be able to change the role of the unsed time series variables to rejected.

That can be done manually using an endless series of metadata nodes. But I'd like a more flexible code driven solution. Is that possible?

Thanks,


G

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

There is not a way to simultaneously use a variable for certain observations in a model and to ignore it for other observations in the same model, so modifying the metadata would not help.  You can use the %EM_METACHANGE macro to change metadata in a SAS Code node but I would not recommend you doing so in this instance.   There are many ways to handle churn modeling depending on the amount of data you have available.

 

One approach is to choose an observation window during which all people of interest are present (e.g. T1-T6), choose a later time to determine which people have churned (e.g. T8-T9).  The purpose of the intermediate time period (T7 in my example) is to make the model look for future churn further in the future allowing time for an intervention that might prevent it.  

 

Another approach is to look at the last several months leading up to a churn or non-churn.  People with data through T36 have not churned (at least to our knowledge).  This approach allows you to look at the last observation available as T1 (lag 1) even though it represents time period 12, for example, if they churned in time period 13.   You might lose some historical data but it is likely more important to have access to data from the time periods near when the customer churned.  This latter approach provides more flexibility which can create more data but it confounds the actual response with effects that might have changed over the various time periods.   

In either case, changing the metadata will not address your concern.


Hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

There is not a way to simultaneously use a variable for certain observations in a model and to ignore it for other observations in the same model, so modifying the metadata would not help.  You can use the %EM_METACHANGE macro to change metadata in a SAS Code node but I would not recommend you doing so in this instance.   There are many ways to handle churn modeling depending on the amount of data you have available.

 

One approach is to choose an observation window during which all people of interest are present (e.g. T1-T6), choose a later time to determine which people have churned (e.g. T8-T9).  The purpose of the intermediate time period (T7 in my example) is to make the model look for future churn further in the future allowing time for an intervention that might prevent it.  

 

Another approach is to look at the last several months leading up to a churn or non-churn.  People with data through T36 have not churned (at least to our knowledge).  This approach allows you to look at the last observation available as T1 (lag 1) even though it represents time period 12, for example, if they churned in time period 13.   You might lose some historical data but it is likely more important to have access to data from the time periods near when the customer churned.  This latter approach provides more flexibility which can create more data but it confounds the actual response with effects that might have changed over the various time periods.   

In either case, changing the metadata will not address your concern.


Hope this helps!

Doug

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1415 views
  • 0 likes
  • 2 in conversation