BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juanvg1972
Pyrite | Level 9

Hi,

I wanto to make a decision tree model with SAS. I don't jnow if I can do it with Entrprise Guide but I didn't find any task to do it.

Is Enterprise needed.?. Can i Do in a SAS BASE proc?

I want to build and use a model with decision tree algorhitmes.

Somethnig similar to this logistic regression, but with a decision tree:

/* Build the  model1 */

proc logistic data=entreno outmodel=model1;

class cod_tarifa;

model hc_consumo=cod_tarifa cod_segmento;

quit;

/* using model1 */

proc logistic inmodel=model1 ;

score data=test out=test1;

quit;


Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

Gergely, Split and dmsplit also are existing in Eminer. Hpsplit is an improvement of that all and production. The HPsplit is part of SAS/stat not Em (my failure).

Some decisions of SAS on procs/licenses are difficult to understand. 

---->-- ja karman --<-----

View solution in original post

17 REPLIES 17
jakarman
Barite | Level 11

Proc arboretum For a pitty is is part of the EM (Enterprise Miner) license. Proc logistic is not part of SAS/base but belongs to SAS/STAT  SAS/STAT(R) 13.1 User's Guide
http://support.sas.com/documentation/onlinedoc/miner/em43/allproc.pdf   http://support.sas.com/resources/papers/proceedings11/155-2011.pdf

Having SAS VA you can also see a decision tree as option. SAS(R) Visual Analytics 7.1: User's Guide

---->-- ja karman --<-----
juanvg1972
Pyrite | Level 9

Thank Gergely.

I have seen that you can create a model (decision tree) with proc hpsplit and apply it.

For example:

proc hpsplit data=sashelp.hmeq maxdepth=7 maxbranch=2;

target BAD;

input DELINQ DEROG JOB NINQ REASON / level=nom;

input CLAGE CLNO DEBTINC LOAN MORTDUE VALUE YOJ  / level=int;

prune misc / N <= 10;

partition fraction(validate=0.2);

code file='hpsplhme-code.sas';

run;

    data scored;

set sashelp.hmeq;

%include 'hpsplhme-code.sas';

run;


My question is....is a reliable proc?, Can I use instead of arboretum?, I don't have E. Miner license.

If you can talk me about the diferences...

Thanks,

Juan

gergely_batho
SAS Employee

Yes, HPSPLIT is a reliable proc. It is production. It is part of SAS/STAT, you don't need Enterprise Minder license to use it.

HPSPLIT is a new procedure compared to ARBORETUM. And the direct usage of the later is not officially supported by SAS, only if you use it via Enterprise Miner nodes.(This does not mean it is not working, it means only you cannot open a ticket at SAS.) Also you need EM license to use proc arboretum.

There are many common and distinct features. I don't know of a document that would summarize those.

juanvg1972
Pyrite | Level 9

Hi Jaap,

I am having problems executing proc hpsplit and  I ask for help:

23         proc hpsplit data=test maxdepth=8 maxbranch=2 codefile='modelo_dtree.sas';

24           target baja;

25           input comp_pend edad ;

26        

27         run;

ERROR: HPSPLIT was unable to open the code file for output.

I have tried with the whole path '\home\user\sas\models\modelo_dtree.sas' but it doesn't work.

Is there any other way to  save the model?, like....outmodel= in proc logistic...

Can you or anybody help me?

Thanks

gergely_batho
SAS Employee

Check your permission settings on the UNIX folder. Who is the owner (user name) of the SAS process?

No, there is no outmodel=, or similar.

Try with this:

codefile="%sysfunc(pathname(work))/model_dtree.sas"

This writes the file to the location of WORK library. You have write permissions there for sure.

juanvg1972
Pyrite | Level 9

Thanks Gergely,

Now it works properly. Is there anyway to visualize the tree of the model. A graphic with the branchs and nodes...

I am doing that:

proc hpsplit data=test maxdepth=8 maxbranch=2;

  target baja;

  input comp_pend ;

  code file="%sysfunc(pathname(work))/model_dtree.sas";

run;

/* Aplicar el modelo: */

data validar2;

  set validar;

  %include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks!!

gergely_batho
SAS Employee

In SAS/STAT 14.1 HPSPLIT will have nice diagrams:

https://support.sas.com/rnd/app/stat/papers/2015/SAS1940_stokes.pdf

Right now you have to assemble the graph yourself from the NODESTATS= datasets.

juanvg1972
Pyrite | Level 9

Hi Gergely,

I am working with SAS OnDemand for Academics in Enterprise Guide.

I can work with proc hpsplit in SAS/STAT module.

I have testes the methos explaines in the document you said (SAS1940_stokes.pdf)

it doesn't work in my version, parameters like model or class doesn't exists in my version:

hpsplit.png

I can run this properly:

proc hpsplit data=test maxdepth=4 maxbranch=2;

  target res_campaña; /* variable a predecir */

  input tipo_cliente compras_3m compras_12m; /* variables en base a las q predecimos */

  code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

run;

I would like to obtain the graph of the decision tree, the ROC curve and the cross validation matrix.

Is that possible?

Thanks in advance,

gergely_batho
SAS Employee

Hi,

On the below link you can find the User's Guide: High-Performance Procedures for all versions of SAS:

SAS/STAT

Check you version, and look at the appropriate description.

It seems in SAS/STAT 14.1 HPSPIT syntax will more similar to the other SAS/STAT procedures  (class statement,  model statement).

You can write the predictions of HPSPLIT to a dataset than use standard tools to  produce ROC curve (PROC SGPLOT) or CV matrix (PROC MEANS).

You can also use PROC LOGISTIC with the nofit option to produce "automatically" ROC:

SAS/STAT(R) 13.2 User's Guide

To produce tree graphs you could:

- Use PROC NETDRAW. It is part of SAS/OR (I think it is available on SAS OnDemand for Academics😞 SAS/OR(R) 13.2 User's Guide: Project Management

- Calculate the layout of the tree than use lines circles, polygons, etc. to draw it (SGPLOT). Not easy. Smiley Sad

- Output the structure of the tree to a text file, than use a graph drawing tool like GraphViz. (Sorry, I know it is not available on SAS OnDemand for Academics, but if you download that text file, you can run a locally installed GraphViz on it.)

- Generate interactive tree with SAS/GRAPH: SAS/GRAPH(R) 9.4: Java Applets and ActiveX Control User's Guide   (Check technical requirements here.)

juanvg1972
Pyrite | Level 9

Thanks very much Gergely,

Can you give an example of ROC curve using proc sgplot?, do I have to use the table scored?

In my case: validar_res teble:

proc hpsplit data=test maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

  target res_campaña; /* variable a predecir */

  input tipo_cliente edad compras_3m compras_12m; /* variables en base a las q predecimos */

  code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

  rules file="/home/juanvg1972/ficheros/rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data validar_res;

  set validar;

  %include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks,

gergely_batho
SAS Employee

Hi,

Here's an example using sashelp.class (no tested throughout):

proc hpsplit data=sashelp.class maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

  target sex; /* variable a predecir */

  input height; /* variables en base a las q predecimos */

  code file="c:\temp\model_dtree.sas"; /* guarda el modelo construido */

  rules file="c:\temp\rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data valid;

  set sashelp.class;

  %include "c:\temp\model_dtree.sas";

run;

proc means data=valid noprint nway;

  var P_SexM;

  class sex;

  output out=outmeans n=n;

run;

proc sort data=valid out=valid_sorted;

  by descending P_SexF;

run;

data valid_cum;

  if _N_=1 then do;

  TP=0;FP=0;

  set outmeans(where=(sex='F') keep=sex _FREQ_ rename=(_FREQ_=FN));/*initially FN=Num of females*/

  set outmeans(where=(sex='M') keep=sex _FREQ_ rename=(_FREQ_=TN));/*initially TN=Num of males*/

  output;

  end;

  set valid_sorted end=last;

  TP+(sex='F');FP+(sex~='F');

  FN+(-(sex='F'));TN+(-(sex~='F'));

  sens=TP/(TP+FN);

  _1mspec=1-(TN/(TN+FP));

  output;

run;

proc sgplot data=valid_cum;

  series x=_1mspec y=sens;

run;

Also check this thread:

juanvg1972
Pyrite | Level 9

Thanks for your help, Gergely

Is there anyway to get the importance of the variables in the model?

I want to know how important is each variable in the model.

Thanks in advance,

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 25477 views
  • 14 likes
  • 3 in conversation