turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Decision tree in SAS

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-15-2015 05:21 AM

Hi,

I wanto to make a decision tree model with SAS. I don't jnow if I can do it with Entrprise Guide but I didn't find any task to do it.

Is Enterprise needed.?. Can i Do in a SAS BASE proc?

I want to build and use a model with decision tree algorhitmes.

Somethnig similar to this logistic regression, but with a decision tree:

**/* Build the model1 */**

**proc** **logistic** data=entreno outmodel=model1;

class cod_tarifa;

model hc_consumo=cod_tarifa cod_segmento;

**quit**;

/* using model1 */

**proc** **logistic** inmodel=model1 ;

score data=test out=test1;

**quit**;

Thanks in advance.

Accepted Solutions

Solution

05-15-2015
10:19 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-15-2015 10:19 AM

Gergely, Split and dmsplit also are existing in Eminer. Hpsplit is an improvement of that all and production. The HPsplit is part of SAS/stat not Em (my failure).

Some decisions of SAS on procs/licenses are difficult to understand.

---->-- ja karman --<-----

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-15-2015 05:59 AM

Proc arboretum For a pitty is is part of the EM (Enterprise Miner) license. Proc logistic is not part of SAS/base but belongs to SAS/STAT SAS/STAT(R) 13.1 User's Guide

http://support.sas.com/documentation/onlinedoc/miner/em43/allproc.pdf http://support.sas.com/resources/papers/proceedings11/155-2011.pdf

Having SAS VA you can also see a decision tree as option. SAS(R) Visual Analytics 7.1: User's Guide

---->-- ja karman --<-----

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-15-2015 08:21 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-15-2015 09:38 AM

Thank Gergely.

I have seen that you can create a model (decision tree) with proc hpsplit and apply it.

For example:

proc hpsplit data=sashelp.hmeq maxdepth=7 maxbranch=2;

target BAD;

input DELINQ DEROG JOB NINQ REASON / level=nom;

input CLAGE CLNO DEBTINC LOAN MORTDUE VALUE YOJ / level=int;

prune misc / N <= 10;

partition fraction(validate=0.2);

code file='hpsplhme-code.sas';

run;

data scored;

set sashelp.hmeq;

%include 'hpsplhme-code.sas';

run;

My question is....is a reliable proc?, Can I use instead of arboretum?, I don't have E. Miner license.

If you can talk me about the diferences...

Thanks,

Juan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-15-2015 09:55 AM

Yes, HPSPLIT is a reliable proc. It is production. It is part of SAS/STAT, you don't need Enterprise Minder license to use it.

HPSPLIT is a new procedure compared to ARBORETUM. And the **direct** usage of the later is not officially supported by SAS, only if you use it via Enterprise Miner nodes.(This does not mean it is not working, it means only you cannot open a ticket at SAS.) Also you need EM license to use proc arboretum.

There are many common and distinct features. I don't know of a document that would summarize those.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-15-2015 06:11 PM

Hi Jaap,

I am having problems executing proc hpsplit and I ask for help:

23 proc hpsplit data=test maxdepth=8 maxbranch=2 codefile='modelo_dtree.sas';

24 target baja;

25 input comp_pend edad ;

26

27 run;

ERROR: HPSPLIT was unable to open the code file for output.

I have tried with the whole path '\home\user\sas\models\modelo_dtree.sas' but it doesn't work.

Is there any other way to save the model?, like....outmodel= in proc logistic...

Can you or anybody help me?

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-15-2015 07:20 PM

Check your permission settings on the UNIX folder. Who is the owner (user name) of the SAS process?

No, there is no outmodel=, or similar.

Try with this:

codefile="%sysfunc(pathname(work))/model_dtree.sas"

This writes the file to the location of WORK library. You have write permissions there for sure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-16-2015 05:38 AM

Thanks Gergely,

Now it works properly. Is there anyway to visualize the tree of the model. A graphic with the branchs and nodes...

I am doing that:

proc hpsplit data=test maxdepth=8 maxbranch=2;

target baja;

input comp_pend ;

code file="%sysfunc(pathname(work))/model_dtree.sas";

run;

/* Aplicar el modelo: */

data validar2;

set validar;

%include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-18-2015 08:36 PM

In SAS/STAT 14.1 HPSPLIT will have nice diagrams:

https://support.sas.com/rnd/app/stat/papers/2015/SAS1940_stokes.pdf

Right now you have to assemble the graph yourself from the NODESTATS= datasets.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-26-2015 07:04 AM

Hi Gergely,

I am working with SAS OnDemand for Academics in Enterprise Guide.

I can work with proc hpsplit in SAS/STAT module.

I have testes the methos explaines in the document you said (SAS1940_stokes.pdf)

it doesn't work in my version, parameters like model or class doesn't exists in my version:

I can run this properly:

proc hpsplit data=test maxdepth=4 maxbranch=2;

target res_campaña; /* variable a predecir */

input tipo_cliente compras_3m compras_12m; /* variables en base a las q predecimos */

code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

run;

I would like to obtain the graph of the decision tree, the ROC curve and the cross validation matrix.

Is that possible?

Thanks in advance,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-26-2015 09:33 AM

Hi,

On the below link you can find the User's Guide: High-Performance Procedures for all versions of SAS:

Check you version, and look at the appropriate description.

It seems in SAS/STAT 14.1 HPSPIT syntax will more similar to the other SAS/STAT procedures (class statement, model statement).

You can write the predictions of HPSPLIT to a dataset than use standard tools to produce ROC curve (PROC SGPLOT) or CV matrix (PROC MEANS).

You can also use PROC LOGISTIC with the **nofit** option to produce "automatically" ROC:

To produce tree graphs you could:

- Use PROC NETDRAW. It is part of SAS/OR (I think it is available on SAS OnDemand for Academics): SAS/OR(R) 13.2 User's Guide: Project Management

- Calculate the layout of the tree than use lines circles, polygons, etc. to draw it (SGPLOT). Not easy.

- Output the structure of the tree to a text file, than use a graph drawing tool like GraphViz. (Sorry, I know it is not available on SAS OnDemand for Academics, but if you download that text file, you can run a locally installed GraphViz on it.)

- Generate interactive tree with SAS/GRAPH: SAS/GRAPH(R) 9.4: Java Applets and ActiveX Control User's Guide (Check technical requirements here.)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-26-2015 09:54 AM

Thanks very much Gergely,

Can you give an example of **ROC curve** using **proc sgplot**?, do I have to use the table scored?

In my case: validar_res teble:

proc hpsplit data=test maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

target res_campaña; /* variable a predecir */

input tipo_cliente edad compras_3m compras_12m; /* variables en base a las q predecimos */

code file="%sysfunc(pathname(work))/model_dtree.sas"; /* guarda el modelo construido */

rules file="/home/juanvg1972/ficheros/rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data validar_res;

set validar;

%include "%sysfunc(pathname(work))/model_dtree.sas";

run;

Thanks,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-26-2015 11:14 AM

Hi,

Here's an example using sashelp.class (no tested throughout):

proc hpsplit data=sashelp.class maxdepth=4 maxbranch=2 nodestats=arbol; /* nodestats guarda el arbol */

target sex; /* variable a predecir */

input height; /* variables en base a las q predecimos */

code file="c:\temp\model_dtree.sas"; /* guarda el modelo construido */

rules file="c:\temp\rules_dtree.txt"; /* reglas aplicadas */

run;

/* Aplicar el modelo: */

data valid;

set sashelp.class;

%include "c:\temp\model_dtree.sas";

run;

proc means data=valid noprint nway;

var P_SexM;

class sex;

output out=outmeans n=n;

run;

proc sort data=valid out=valid_sorted;

by descending P_SexF;

run;

data valid_cum;

if _N_=1 then do;

TP=0;FP=0;

set outmeans(where=(sex='F') keep=sex _FREQ_ rename=(_FREQ_=FN));/*initially FN=Num of females*/

set outmeans(where=(sex='M') keep=sex _FREQ_ rename=(_FREQ_=TN));/*initially TN=Num of males*/

output;

end;

set valid_sorted end=last;

TP+(sex='F');FP+(sex~='F');

FN+(-(sex='F'));TN+(-(sex~='F'));

sens=TP/(TP+FN);

_1mspec=1-(TN/(TN+FP));

output;

run;

proc sgplot data=valid_cum;

series x=_1mspec y=sens;

run;

Also check this thread:

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gergely_batho

05-26-2015 06:36 PM

Thanks for your help, Gergely

Is there anyway to get the importance of the variables in the model?

I want to know how important is each variable in the model.

Thanks in advance,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to juanvg1972

05-26-2015 06:56 PM

output importance=