BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shilpaISBCBA
Fluorite | Level 6

I have  integer variables (eg number of dependants, number of bills, etc) with high logworth.

When the rules are applied these attribute gets split at a fraction (10.5, 51.5, etc).. which doesnt make sense as per business logic.

 

Is there a way in enterprise miner to enforce the split an integer value, instead of a fraction. I  do not want to use the interactive tree option. Please suggest other alternatives if any?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Is there a way in enterprise miner to enforce the split an integer value, instead of a fraction. I  do not want to use the interactive tree option. Please suggest other alternatives if any?

 

The splitting rules are created based on the measurement level of the variable.    You are getting splits in the middle of the integer values due to specifying the variable as an Interval variable.    You can choose to model a variable like this as Interval, Ordinal, or Nominal but there is not a way to specify a variable as an integer.  Among the three options, choosing Interval is typically best for the following reasons:

 

    * specifying Ordinal sacrifices the relative size information (e.g. 2 children is twice as many as 1 child) since it only focuses on the order of the levels

 

    * specifying Nominal sacrifices the ordering of levels (e.g.  if your values were 1, 4, 5, 11, 20 then the Nominal ordering is 1, 11, 20, 4, 5)  since it only focuses on their being different but unordered levels.

 

     * specifying either Nominal or Ordinal can decrease the information available for scoring when the training data does not have every possible value, since new levels that were not in the training data will be treated as if they were 'missing'  (e.g. suppose your training data had values ranging from 1-6 and 8-11 for number of children but the scoring data had someone with 7 children -- this is handled easily when treating the variable as interval but is treated as a previously unknown level in Ordinal/Nominal since the training data did not have anyone with 7 children). 

 

      * for other parameter-based modeling methods such as Regressions or Neural Networks, you can often model the relationship with fewer parameters when you treat the variable as an interval variable, but both Nominal & Ordinal variables will require at least k-1 parameters for k-levels in the training data.  

 

Hope this helps!

 

Cordially,

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

Is there a way in enterprise miner to enforce the split an integer value, instead of a fraction. I  do not want to use the interactive tree option. Please suggest other alternatives if any?

 

The splitting rules are created based on the measurement level of the variable.    You are getting splits in the middle of the integer values due to specifying the variable as an Interval variable.    You can choose to model a variable like this as Interval, Ordinal, or Nominal but there is not a way to specify a variable as an integer.  Among the three options, choosing Interval is typically best for the following reasons:

 

    * specifying Ordinal sacrifices the relative size information (e.g. 2 children is twice as many as 1 child) since it only focuses on the order of the levels

 

    * specifying Nominal sacrifices the ordering of levels (e.g.  if your values were 1, 4, 5, 11, 20 then the Nominal ordering is 1, 11, 20, 4, 5)  since it only focuses on their being different but unordered levels.

 

     * specifying either Nominal or Ordinal can decrease the information available for scoring when the training data does not have every possible value, since new levels that were not in the training data will be treated as if they were 'missing'  (e.g. suppose your training data had values ranging from 1-6 and 8-11 for number of children but the scoring data had someone with 7 children -- this is handled easily when treating the variable as interval but is treated as a previously unknown level in Ordinal/Nominal since the training data did not have anyone with 7 children). 

 

      * for other parameter-based modeling methods such as Regressions or Neural Networks, you can often model the relationship with fewer parameters when you treat the variable as an interval variable, but both Nominal & Ordinal variables will require at least k-1 parameters for k-levels in the training data.  

 

Hope this helps!

 

Cordially,

Doug

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 878 views
  • 1 like
  • 2 in conversation