Programming the statistical procedures from SAS

Decision tree in SAS

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 85
Accepted Solution

Decision tree in SAS

Hi,

I wanto to make a decision tree model with SAS. I don't jnow if I can do it with Entrprise Guide but I didn't find any task to do it.

Is Enterprise needed.?. Can i Do in a SAS BASE proc?

I want to build and use a model with decision tree algorhitmes.

Somethnig similar to this logistic regression, but with a decision tree:

/* Build the  model1 */

proc logistic data=entreno outmodel=model1;

class cod_tarifa;

model hc_consumo=cod_tarifa cod_segmento;

quit;

/* using model1 */

proc logistic inmodel=model1 ;

score data=test out=test1;

quit;


Thanks in advance.


Accepted Solutions
Solution
‎05-15-2015 10:19 AM
Valued Guide
Posts: 3,206

Re: Decision tree in SAS

Gergely, Split and dmsplit also are existing in Eminer. Hpsplit is an improvement of that all and production. The HPsplit is part of SAS/stat not Em (my failure).

Some decisions of SAS on procs/licenses are difficult to understand. 

---->-- ja karman --<-----

View solution in original post


All Replies
Valued Guide
Posts: 3,206

Re: Decision tree in SAS

Proc arboretum For a pitty is is part of the EM (Enterprise Miner) license. Proc logistic is not part of SAS/base but belongs to SAS/STAT  SAS/STAT(R) 13.1 User's Guide
http://support.sas.com/documentation/onlinedoc/miner/em43/allproc.pdf   http://support.sas.com/resources/papers/proceedings11/155-2011.pdf

Having SAS VA you can also see a decision tree as option. SAS(R) Visual Analytics 7.1: User's Guide

---->-- ja karman --<-----
SAS Employee
Posts: 340

Re: Decision tree in SAS

Check proc hpsplit

It is part of SAS/STAT

SAS/STAT(R) 13.2 User's Guide: High-Performance Procedures

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Thank Gergely.

I have seen that you can create a model (decision tree) with proc hpsplit and apply it.

For example:

proc hpsplit data=sashelp.hmeq maxdepth=7 maxbranch=2;

target BAD;

input DELINQ DEROG JOB NINQ REASON / level=nom;

input CLAGE CLNO DEBTINC LOAN MORTDUE VALUE YOJ  / level=int;

prune misc / N <= 10;

partition fraction(validate=0.2);

code file='hpsplhme-code.sas';

run;

    data scored;

set sashelp.hmeq;

%include 'hpsplhme-code.sas';

run;


My question is....is a reliable proc?, Can I use instead of arboretum?, I don't have E. Miner license.

If you can talk me about the diferences...

Thanks,

Juan

SAS Employee
Posts: 340

Re: Decision tree in SAS

Yes, HPSPLIT is a reliable proc. It is production. It is part of SAS/STAT, you don't need Enterprise Minder license to use it.

HPSPLIT is a new procedure compared to ARBORETUM. And the direct usage of the later is not officially supported by SAS, only if you use it via Enterprise Miner nodes.(This does not mean it is not working, it means only you cannot open a ticket at SAS.) Also you need EM license to use proc arboretum.

There are many common and distinct features. I don't know of a document that would summarize those.

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Hi Jaap,

I am having problems executing proc hpsplit and  I ask for help:

23         proc hpsplit data=test maxdepth=8 maxbranch=2 codefile='modelo_dtree.sas';

24           target baja;

25           input comp_pend edad ;

26        

27         run;

ERROR: HPSPLIT was unable to open the code file for output.

I have tried with the whole path '\home\user\sas\models\modelo_dtree.sas' but it doesn't work.

Is there any other way to  save the model?, like....outmodel= in proc logistic...

Can you or anybody help me?

Thanks

SAS Employee
Posts: 340

Re: Decision tree in SAS

Check your permission settings on the UNIX folder. Who is the owner (user name) of the SAS process?

No, there is no outmodel=, or similar.

Try with this:

codefile="%sysfunc(pathname(work))/model_dtree.sas"

This writes the file to the location of WORK library. You have write permissions there for sure.

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Thanks Gergely,

Now it works properly. Is there anyway to visualize the tree of the model. A graphic with the branchs and nodes...

I am doing that:

proc hpsplit data=test maxdepth=8 maxbranch=2;

  target baja;

  input comp_pend ;

  code file="%sysfunc(pathname(work))/model_dtree.sas";

run;

/* Aplicar el modelo: */

data validar2;

  set validar;

  %include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks!!

SAS Employee
Posts: 340

Re: Decision tree in SAS

In SAS/STAT 14.1 HPSPLIT will have nice diagrams:

https://support.sas.com/rnd/app/stat/papers/2015/SAS1940_stokes.pdf

Right now you have to assemble the graph yourself from the NODESTATS= datasets.

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Hi Gergely,

I am working with SAS OnDemand for Academics in Enterprise Guide.

I can work with proc hpsplit in SAS/STAT module.

I have testes the methos explaines in the document you said (SAS1940_stokes.pdf)

it doesn't work in my version, parameters like model or class doesn't exists in my version:

hpsplit.png

I can run this properly:

proc hpsplit data=test maxdepth=4 maxbranch=2;

  target res_campaña; /* variable a predecir */

  input tipo_cliente compras_3m compras_12m; /* variables en base a las q predecimos */

  code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

run;

I would like to obtain the graph of the decision tree, the ROC curve and the cross validation matrix.

Is that possible?

Thanks in advance,

SAS Employee
Posts: 340

Re: Decision tree in SAS

Hi,

On the below link you can find the User's Guide: High-Performance Procedures for all versions of SAS:

SAS/STAT

Check you version, and look at the appropriate description.

It seems in SAS/STAT 14.1 HPSPIT syntax will more similar to the other SAS/STAT procedures  (class statement,  model statement).

You can write the predictions of HPSPLIT to a dataset than use standard tools to  produce ROC curve (PROC SGPLOT) or CV matrix (PROC MEANS).

You can also use PROC LOGISTIC with the nofit option to produce "automatically" ROC:

SAS/STAT(R) 13.2 User's Guide

To produce tree graphs you could:

- Use PROC NETDRAW. It is part of SAS/OR (I think it is available on SAS OnDemand for Academics): SAS/OR(R) 13.2 User's Guide: Project Management

- Calculate the layout of the tree than use lines circles, polygons, etc. to draw it (SGPLOT). Not easy. Smiley Sad

- Output the structure of the tree to a text file, than use a graph drawing tool like GraphViz. (Sorry, I know it is not available on SAS OnDemand for Academics, but if you download that text file, you can run a locally installed GraphViz on it.)

- Generate interactive tree with SAS/GRAPH: SAS/GRAPH(R) 9.4: Java Applets and ActiveX Control User's Guide   (Check technical requirements here.)

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Thanks very much Gergely,

Can you give an example of ROC curve using proc sgplot?, do I have to use the table scored?

In my case: validar_res teble:

proc hpsplit data=test maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

  target res_campaña; /* variable a predecir */

  input tipo_cliente edad compras_3m compras_12m; /* variables en base a las q predecimos */

  code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

  rules file="/home/juanvg1972/ficheros/rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data validar_res;

  set validar;

  %include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks,

SAS Employee
Posts: 340

Re: Decision tree in SAS

Hi,

Here's an example using sashelp.class (no tested throughout):

proc hpsplit data=sashelp.class maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

  target sex; /* variable a predecir */

  input height; /* variables en base a las q predecimos */

  code file="c:\temp\model_dtree.sas"; /* guarda el modelo construido */

  rules file="c:\temp\rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data valid;

  set sashelp.class;

  %include "c:\temp\model_dtree.sas";

run;

proc means data=valid noprint nway;

  var P_SexM;

  class sex;

  output out=outmeans n=n;

run;

proc sort data=valid out=valid_sorted;

  by descending P_SexF;

run;

data valid_cum;

  if _N_=1 then do;

  TP=0;FP=0;

  set outmeans(where=(sex='F') keep=sex _FREQ_ rename=(_FREQ_=FN));/*initially FN=Num of females*/

  set outmeans(where=(sex='M') keep=sex _FREQ_ rename=(_FREQ_=TN));/*initially TN=Num of males*/

  output;

  end;

  set valid_sorted end=last;

  TP+(sex='F');FP+(sex~='F');

  FN+(-(sex='F'));TN+(-(sex~='F'));

  sens=TP/(TP+FN);

  _1mspec=1-(TN/(TN+FP));

  output;

run;

proc sgplot data=valid_cum;

  series x=_1mspec y=sens;

run;

Also check this thread:

Frequent Contributor
Posts: 85

Re: Decision tree in SAS

Thanks for your help, Gergely

Is there anyway to get the importance of the variables in the model?

I want to know how important is each variable in the model.

Thanks in advance,

SAS Employee
Posts: 340

Re: Decision tree in SAS

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 8931 views
  • 14 likes
  • 3 in conversation