BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wave43
Obsidian | Level 7

I have a large project in which I generated 400+ decision trees. I need to export the SAS Node Leaf  dataset automatically from a node. I do not want to have to click "Save as" 400+ times.  Is there a way I can export it as part of the model? Placing the control point at the end of the all the paths has been a great tool to generate all the trees "at once".  Can I insert a node to export the leaf datasets before the control point?

 

Also, is there a way to do this with the output which contain the fit statistics?

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

I believe the correct names of the data sets are actually TreeZ_outseq, TreeZ_outnodes, and TreeZ_outstats (assuming that is what corresponds to statsbynode).  You can see all the data sets from the Decision Tree node by opening the Explorer window inside EM, click the checkbox for Show Project Data, then navigate to your workspace folder (Emws15 e.g.).  All the data sets with the Treeprefix are the ones created by the Decision Tree node.

View solution in original post

12 REPLIES 12
PadraicGNeville
SAS Employee

The EM Decision Tree node saves the tree model in a data set that PROC ARBOR understands.  PROC ARBOR can output information about all the nodes in the tree.  It has no output just for the leaves.

 

In an EM code node,

 

PROC ARBOR INMODEL= &em_lib..treeZ_emtree;
MACRO NLEAVES=nleaves;
SAVE SEQUENCE=sequence NODESTATS=nodes  STATSBYNODE=statsbynode;
QUIT;

inputs the tree &em_lib..treeZ_emtree, where TreeZ is the node ID for the Decision Tree node, and outputs
a macro variable, NLEAVES, equal to the number of leaves in the tree, and three data sets:

 

sequence: each row has fit statistics for a subtree with a specific number of leaves,
nodes: each row contains statistics for a node (including leaves) in the current subtree (with &NLEAVES leaves)
statsbynode: node statistics re-arranged to allow for more statistics.

 

(I am not 100% sure about the macro name, &em_lib..treeZ_emtree.)

Is this what you need to know? 

 

 

wave43
Obsidian | Level 7

This is exactly what I need!  However, when I run the below code in the SAS code node, I am not seeing the three data sets you mentioned. I only see the file names below. Any thoughts? Thanks.

.File Names.png

PadraicGNeville
SAS Employee

I would first look at the log.  I do not know enough about the EM interface to find the log.  If no one else responds, try contacting tech support.

 

wave43
Obsidian | Level 7

The logs have an error when looking for the EMTREE data.

 

Maybe I don't understand your notation for "TREEZ", but I tried substituting the node id (Tree38) for Z

 

ERROR: File WORK.EMWS15TREE38_EMTREE.DATA does not exist

 

and when I leave in TREEZ

 

ERROR: File WORK.EMWS15TREEZ_EMTREE.DATA does not exist

 

Either way, I don't see any *EMTREE* data in my directories.

 

Any idea?

 

 

 

PadraicGNeville
SAS Employee

You're helping me more than I am helping you now.   The EM macro, directory, and file names are a mystery to me.  I'm sorry I can't help more.

WendyCzika
SAS Employee

Make sure you use 2 periods before tree38:

 

inmodel=&em_lib..tree38_emtree

wave43
Obsidian | Level 7

Yes, that was my issue! However,  I still don't see the files for "node", "sequence", or "nodes" my EM_LIB directory. That should be the names of the datasets. Correct?

 

Thank you! 

PadraicGNeville
SAS Employee

The log should report the name of the dataset and the number of obsevations shortly after it reports the PROC ARBOR and SAVE statement.  Remember to replace "sequence" by a libname.memname of your choice.  For example, EM_LIB.Sequence. 

wave43
Obsidian | Level 7

Thanks. I tried "&em_lib..", "&em_lib.", and "em_lib." as repressented by the below in bold:

PROC ARBOR INMODEL= &em_lib..Tree38_emtree;

MACRO NLEAVES= &em_lib..nleaves;

SAVE SEQUENCE= &em_lib..sequence NODESTATS= &em_lib..nodes  STATSBYNODE= &em_lib..statsbynode;

QUIT;

 

the log says (and no location):

 

14896  %let syscc = 0;

14897  %inc "/xxxx/Projects/xxxx Data Mining 2015/Predictive Models/Workspaces/EMWS15/EMCODE/EMTRAINCODE.sas";

NOTE: %INCLUDE (level 1) file /xxxx/Projects/xxxx Data Mining 2015/Predictive Models/Workspaces/EMWS15/EMCODE/EMTRAINCODE.sas is file /xxxxx/Projects/xxxxx Data Mining 2015/Predictive Models/Workspaces/EMWS15/EMCODE/EMTRAINCODE.sas.

14898 +PROC ARBOR INMODEL= &em_lib..Tree38_emtree;

NOTE: 1654978 kilobytes of physical memory.

NOTE: The subtree sequence contains 5 subtrees. The largest has 9 nodes and 5 leaves.

14899 +MACRO NLEAVES= &em_lib..nleaves;

14900 +SAVE SEQUENCE= &em_lib..sequence NODESTATS= &em_lib..nodes  STATSBYNODE= &em_lib..statsbynode;

14901 +QUIT;

NOTE: %INCLUDE (level 1) ending.

14902  *------------------------------------------------------------*;

14903  * Close any missing semi colons;

14904  *------------------------------------------------------------*;

14905  ;

14906  ;

14907  ;

14908  ;

14909  quit;

wave43
Obsidian | Level 7

I received an email asking if I have a solution. I still do not. I am trying many options including the ones I highlighted in my previous post.

 

 

 

WendyCzika
SAS Employee

I believe the correct names of the data sets are actually TreeZ_outseq, TreeZ_outnodes, and TreeZ_outstats (assuming that is what corresponds to statsbynode).  You can see all the data sets from the Decision Tree node by opening the Explorer window inside EM, click the checkbox for Show Project Data, then navigate to your workspace folder (Emws15 e.g.).  All the data sets with the Treeprefix are the ones created by the Decision Tree node.

wave43
Obsidian | Level 7

Thank you!! Exactly what I was looking for. I can't believe how hard I made this.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2245 views
  • 1 like
  • 3 in conversation