BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zachary
Obsidian | Level 7

What is the best and most efficient way to save a Tree Diagram in Enterprise Miner (EM) and apply it to all 100% of the data for final results? I wish to keep my nodes static as much as I can, and as easily as possible.

I am starting out using an 80/20 split. This might move closer to 60/40, but we will see. On Monday I think we will have our model finalized. Then I would like it applied to 100% of my original data.

It also will help if EM generates code to be fully utilized by Enterprise Guide. Below is an example of some of the code generated by EM:

Node = 166

*------------------------------------------------------------*

if PURE_PREMIUM >= 5684.5 or MISSING

AND PAYROLL < 693492

AND HAZARD_CODE <= D

AND Business Unit IS ONE OF: 2, 3 or MISSING

AND BLEND_GROSS_LOAD2 >= 149 or MISSING

AND BLEND_GROSS_LOAD1 < 40.5 or MISSING

then

Tree Node Identifier   = 166

Number of Observations = 236

Predicted: D_GROSS_LOADED_WITH_TREND=1 = 0.54

Predicted: D_GROSS_LOADED_WITH_TREND=0 = 0.46

That does not help me much. I would like it more if it looked like something that can be used within standard SAS code. For example - If (PURE_PREMIUM <= 5684.5) or (PURE_PREMIUM =  .)) then ...;

Perhaps I am missing the option to create this code within EM. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

EM does generate the full score code that can be used in EG

I believe there's a score code node that generates the code, what version of EM are you on?

View solution in original post

8 REPLIES 8
Reeza
Super User

EM does generate the full score code that can be used in EG

I believe there's a score code node that generates the code, what version of EM are you on?

Zachary
Obsidian | Level 7

I am on 13.1, and yes I found the score code node. Thank you very much.

Do you recommend I use the Optimized SAS Code or just the regular SAS Code? What are the differences?

Also, there are a lot of code statements in there that I am not familiar with. Do you suggest I just have my code point to my data file/library then just run all of this code that was generated? Or do you recommend anything else? Below are some of the initial statements in my code - not making a lot of sense to me right now:

_ARBFMT_12 = PUT( BU , BEST12.);

%DMNORMIP( _ARBFMT_12);

IF _ARBFMT_12 IN ('1' ) THEN DO;

  IF  NOT MISSING(PURE_PREMIUM ) AND

                  5112.5 <= PURE_PREMIUM  THEN DO;

    _ARBFMT_12 = PUT( SMNQ_D_POST_CODE , BEST12.);

     %DMNORMIP( _ARBFMT_12);

    IF _ARBFMT_12 IN ('3' ) THEN DO;

      _NODE_  =                   81;

      _LEAF_  =                   21;

May I ask for an example of how the Score Code node is incorporated with EG? The reason that I am a little nervous is because I plan on applying this code to a new datset with the same variables. I am much more accustomed to simple code.

Thank you again.

Reeza
Super User

It shouldn't matter which version of the code you use.

Some of the stuff at the top is transformations that may have occurred in various steps of the analysis. 

To use this in EG create a program as follows and that should do what you want.

Data Score;

set <your data>;

<insert code from Enterprise Miner>

run;

Zachary
Obsidian | Level 7

Thank you again for all of the valuable advice.

But I have a huge problem. My diagram is rather simple. I have a data node, a data partition node, then a decision tree node. One of my colleagues and I were "interactively" changing the results of the decision tree mode to be more in alignment with what our results should substantively say. Ultimately we did it using the interactive feature of the decision tree node, then we closed it. We changed a few more things and ultimately re-ran the decision tree node to sort of start over again. But it also re-ran the data partition as well - we cannot figure out why.

Before I fully implement the code generator I would like to somehow lock-down the other nodes. Is this possible with EM to insure that nothing will change?

M_Maldonado
Barite | Level 11

All Enterprise Miner nodes just re-run the part that you need. For example if you change a property under the Report section, only the code that is involved in reporting will re-run.

In your example, if you don't change any properties on the Data Partition node, the green wheel might make it look like its running, but it is just checking that some tables or results exist. It is not re-running the whole thing.

I would need to double check, but I think that if you are using Interactive mode to grow your tree you have to save it, and close it, and not change any property. If you are going to re-run stuff I would suggest you to turn on the property Use Frozen Tree to Yes.

As a bonus, challenge your interactive tree with some other trees. I would use the following and compare their subtree assessment plots and their fit statistics with a Model Comparison node:

  • Largest tree just to confirm that the Largest is an overtrained model.
  • Default tree (maxdepth 6)
  • Tree with maximum depth 10

Good luck,

Miguel

rayIII
SAS Employee

Zachary,

Also check that the Rerun property is set to No for your datasource node. If it is set to Yes, that would explain why the Partition node is running each time you execute the flow.

Ray

Zachary
Obsidian | Level 7

It was already set to No by default. Thank you for the suggestion there.

M_Maldonado
Barite | Level 11

You got what you needed, good to go? How does your tree beat a default tree?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3021 views
  • 8 likes
  • 4 in conversation