Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Using Start & End Group Nodes for Boosting & Bagging

Reply
New Contributor
Posts: 2

Using Start & End Group Nodes for Boosting & Bagging

I have a project that I am trying to classify a rare event and so would like to experiment with boosting and bagging following the example laid out in this article, https://support.sas.com/resources/papers/proceedings14/SAS133-2014.pdf

The problem is that when I set up my project diagram in a similar manner as on page 3 and try to run the process flow I get a run time error as soon as EM tries to process the first start group node. I have posted the output from the log file below. Any help would be greatly appreciated.

20588      %let SYSCC=0;

20589      %let SYSRC=0;

20590      %let EMEXCEPTIONSTRING=;

20591      %let SYSMSG=;

20592      %em_diagram(action=setproperties, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, requestFile=DiagramSetPropertiesRequest.xml);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20593     +%macro em_dsep;

20594     +  %global emdsep;

20595     +   %if %substr(&sysscp, 1, 3)= WIN %then

20596     +       %let emdsep=\;

20597     +   %else

20598     +       %if %substr(&sysscp, 1, 3)= DNT %then

20599     +           %let emdsep=\;

20600     +   %else

20601     +       %let emdsep=/;

20602     +%mend em_dsep;

20603     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: ADDNODE

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

20604      %global EM_REFRESH_PROPERTY;

20605      %let EM_REFRESH_PROPERTY=N;

20606      data _null_;

20607      set EMWS6.EM_NODEID;

20608      where upcase(NODEID)=upcase("EndGrp2");

20609      call symput('_EMCLASS', CLASS);

20610      run;

NOTE: There were 1 observations read from the data set EMWS6.EM_NODEID.

      WHERE UPCASE(NODEID)='ENDGRP2';

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20612      %let syscc=0;

20613      data WORK._EMVISUALPROPERTIES;

20614      length NODEID CLASS COMPONENT $32 X Y 8 LABEL $81;

20615      NODEID = "EndGrp2";

20616      CLASS='';

20617      component='';

20618      X = 698;

20619      Y= 183;

20620      LABEL = "End Groups (2)";

20621      output;

20622      run;

NOTE: The data set WORK._EMVISUALPROPERTIES has 1 observations and 6 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

20623      proc sort data=WORK._EMVISUALPROPERTIES;

20624      by NODEID;

20625      run;

NOTE: There were 1 observations read from the data set WORK._EMVISUALPROPERTIES.

NOTE: The data set WORK._EMVISUALPROPERTIES has 1 observations and 6 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20626      proc sort data=EMWS6.em_nodeid out=_tempNodeid;

20627      by NODEID;

20628      run;

NOTE: There were 8 observations read from the data set EMWS6.EM_NODEID.

NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20629      data _tempNodeid;

20630      update _tempNodeid(in=_a) WORK._EMVISUALPROPERTIES(in=_b);

20631      by NODEID;

20632      if _a then output;

20633      run;

NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.

NOTE: There were 1 observations read from the data set WORK._EMVISUALPROPERTIES.

NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

20634      data EMWS6.em_nodeid;

20635      set _tempNodeid;

20636      run;

NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.

NOTE: The data set EMWS6.EM_NODEID has 8 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20637      proc datasets lib=WORK nolist;

20638      delete _tempNodeid _EMVISUALPROPERTIES;

20639      run;

NOTE: Deleting WORK._TEMPNODEID (memtype=DATA).

NOTE: Deleting WORK._EMVISUALPROPERTIES (memtype=DATA).

20640      quit;

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

70 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

20641      %let SYSCC=0;

20642      %let SYSRC=0;

20643      %let EMEXCEPTIONSTRING=;

20644      %let SYSMSG=;

20645      %em_diagram(action=connectnode, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, from=Tree2, to=EndGrp2);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20646     +%macro em_dsep;

20647     +  %global emdsep;

20648     +   %if %substr(&sysscp, 1, 3)= WIN %then

20649     +       %let emdsep=\;

20650     +   %else

20651     +       %if %substr(&sysscp, 1, 3)= DNT %then

20652     +           %let emdsep=\;

20653     +   %else

20654     +       %let emdsep=/;

20655     +%mend em_dsep;

20656     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: SETPROPERTIES

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

20657      data EMWS6.EM_DGRAPH;

20658      set EMWS6.EM_DGRAPH;

20659      where ^(FROM = "Tree2" and TO = "");

20660      run;

NOTE: There were 8 observations read from the data set EMWS6.EM_DGRAPH.

      WHERE (FROM not = 'Tree2') or (TO not = ' ');

NOTE: The data set EMWS6.EM_DGRAPH has 8 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

71 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

20661      %let SYSCC=0;

20662      %let SYSRC=0;

20663      %let EMEXCEPTIONSTRING=;

20664      %let SYSMSG=;

20665      %em_diagram(action=connectnode, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, from=EndGrp2, to=MdlComp);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20666     +%macro em_dsep;

20667     +  %global emdsep;

20668     +   %if %substr(&sysscp, 1, 3)= WIN %then

20669     +       %let emdsep=\;

20670     +   %else

20671     +       %if %substr(&sysscp, 1, 3)= DNT %then

20672     +           %let emdsep=\;

20673     +   %else

20674     +       %let emdsep=/;

20675     +%mend em_dsep;

20676     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: CONNECTNODE

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

20677      data EMWS6.EM_DGRAPH;

20678      set EMWS6.EM_DGRAPH;

20679      where ^(FROM = "EndGrp2" and TO = "");

20680      run;

NOTE: There were 8 observations read from the data set EMWS6.EM_DGRAPH.

      WHERE (FROM not = 'EndGrp2') or (TO not = ' ');

NOTE: The data set EMWS6.EM_DGRAPH has 8 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

72 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

20681      %let SYSCC=0;

20682      %let SYSRC=0;

20683      %let EMEXCEPTIONSTRING=;

20684      %let SYSMSG=;

20685      %em_diagram(action=setproperties, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, requestFile=DiagramSetPropertiesRequest.xml);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20686     +%macro em_dsep;

20687     +  %global emdsep;

20688     +   %if %substr(&sysscp, 1, 3)= WIN %then

20689     +       %let emdsep=\;

20690     +   %else

20691     +       %if %substr(&sysscp, 1, 3)= DNT %then

20692     +           %let emdsep=\;

20693     +   %else

20694     +       %let emdsep=/;

20695     +%mend em_dsep;

20696     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: CONNECTNODE

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

20697      %global EM_REFRESH_PROPERTY;

20698      %let EM_REFRESH_PROPERTY=N;

20699      data _null_;

20700      set EMWS6.EM_NODEID;

20701      where upcase(NODEID)=upcase("Grp2");

20702      call symput('_EMCLASS', CLASS);

20703      run;

NOTE: There were 1 observations read from the data set EMWS6.EM_NODEID.

      WHERE UPCASE(NODEID)='GRP2';

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20705      %let syscc=0;

20706      data WORK._EMVISUALPROPERTIES;

20707      length NODEID CLASS COMPONENT $32 X Y 8 LABEL $81;

20708      NODEID = "Tree2";

20709      CLASS='';

20710      component='';

20711      X = 456;

20712      Y= 165;

20713      LABEL = "Decision Tree (2)";

20714      output;

20715      NODEID = "FIMPORT";

20716      CLASS='';

20717      component='';

20718      X = 59;

20719      Y= 239;

20720      LABEL = "File Import";

20721      output;

20722      NODEID = "EndGrp";

20723      CLASS='';

20724      component='';

20725      X = 673;

20726      Y= 336;

20727      LABEL = "End Groups";

20728      output;

20729      NODEID = "Tree";

20730      CLASS='';

20731      component='';

20732      X = 477;

20733      Y= 339;

20734      LABEL = "Decision Tree";

20735      output;

20736      NODEID = "EndGrp2";

20737      CLASS='';

20738      component='';

20739      X = 627;

20740      Y= 161;

20741      LABEL = "End Groups (2)";

20742      output;

20743      NODEID = "MdlComp";

20744      CLASS='';

20745      component='';

20746      X = 959;

20747      Y= 240;

20748      LABEL = "Model Comparison";

20749      output;

20750      NODEID = "Grp";

20751      CLASS='';

20752      component='';

20753      X = 262;

20754      Y= 339;

20755      LABEL = "Start Groups (Boosting)";

20756      output;

20757      NODEID = "Grp2";

20758      CLASS='';

20759      component='';

20760      X = 254;

20761      Y= 172;

20762      LABEL = "Start Groups (Bagging)";

20763      output;

20764      run;

NOTE: The data set WORK._EMVISUALPROPERTIES has 8 observations and 6 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

20765      proc sort data=WORK._EMVISUALPROPERTIES;

20766      by NODEID;

20767      run;

NOTE: There were 8 observations read from the data set WORK._EMVISUALPROPERTIES.

NOTE: The data set WORK._EMVISUALPROPERTIES has 8 observations and 6 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20768      proc sort data=EMWS6.em_nodeid out=_tempNodeid;

20769      by NODEID;

20770      run;

NOTE: There were 8 observations read from the data set EMWS6.EM_NODEID.

NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20771      data _tempNodeid;

20772      update _tempNodeid(in=_a) WORK._EMVISUALPROPERTIES(in=_b);

20773      by NODEID;

20774      if _a then output;

20775      run;

NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.

NOTE: There were 8 observations read from the data set WORK._EMVISUALPROPERTIES.

NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

20776      data EMWS6.em_nodeid;

20777      set _tempNodeid;

20778      run;

NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.

NOTE: The data set EMWS6.EM_NODEID has 8 observations and 8 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20779      proc datasets lib=WORK nolist;

20780      delete _tempNodeid _EMVISUALPROPERTIES;

20781      run;

NOTE: Deleting WORK._TEMPNODEID (memtype=DATA).

NOTE: Deleting WORK._EMVISUALPROPERTIES (memtype=DATA).

20782      quit;

NOTE: PROCEDURE DATASETS used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

73 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

20783      %let SYSCC=0;

20784      %let SYSRC=0;

20785      %let EMEXCEPTIONSTRING=;

20786      %let SYSMSG=;

20787      %em_diagram(action=closesession, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20788     +%macro em_dsep;

20789     +  %global emdsep;

20790     +   %if %substr(&sysscp, 1, 3)= WIN %then

20791     +       %let emdsep=\;

20792     +   %else

20793     +       %if %substr(&sysscp, 1, 3)= DNT %then

20794     +           %let emdsep=\;

20795     +   %else

20796     +       %let emdsep=/;

20797     +%mend em_dsep;

20798     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: SETPROPERTIES

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

NOTE: Libref EMWS6 has been deassigned.

74 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

20799      %let SYSCC=0;

20800      %let SYSRC=0;

20801      %let EMEXCEPTIONSTRING=;

20802      %let SYSMSG=;

20803      %em_diagram(action=opensession, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, sessionid=Josh New1430160328326, outfile=DiagramOpenSessionResponse.xml);

NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.

20804     +%macro em_dsep;

20805     +  %global emdsep;

20806     +   %if %substr(&sysscp, 1, 3)= WIN %then

20807     +       %let emdsep=\;

20808     +   %else

20809     +       %if %substr(&sysscp, 1, 3)= DNT %then

20810     +           %let emdsep=\;

20811     +   %else

20812     +       %let emdsep=/;

20813     +%mend em_dsep;

20814     +%em_dsep;

NOTE: %INCLUDE (level 1) ending.

NOTE: Fileref _DGMFRF has been deassigned.

WIP_ACTION: CLOSESESSION

DGMID: EMWS6

LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck

NOTE: Libref EMWS6 was successfully assigned as follows:

      Engine:        V9

      Physical Name: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6

NOTE: There were 9 observations read from the data set EMWS6.EM_DGRAPH.

NOTE: The data set WORK.EM_DGRAPH has 9 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

     

20815      %let syscc=0;

20816      filename _wipchk catalog "EMWS6.EndGrp.test.source";

20817      data _null_;

20818      file _wipchk;

20819      put '/* Test */';

20820      run;

NOTE: The file _WIPCHK is:

      Catalog Name=EMWS6.ENDGRP.TEST.SOURCE,

      Catalog Page Size=4096,

      Number of Catalog Pages=11,

      Created=Monday, April 27, 2015 02:47:34 PM,

      Last Modified=Monday, April 27, 2015 03:17:32 PM,

      Filename=C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\endgrp.sas7bcat,

      Release Created=9.0301M2,

      Host Created=X64_8HOME

NOTE: 1 record was written to the file _WIPCHK.

      The minimum record length was 10.

      The maximum record length was 10.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20821      data _null_;

20822      rc = fdelete('_wipchk');

20823      run;

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

20824      filename _wipchk;

NOTE: Fileref _WIPCHK has been deassigned.

20825      filename _wipxml 'C:\Users\JOSHNE~1\AppData\Local\Temp\SAS Temporary Files\_TD2700_JOSHMOBILE_\Prc2\DiagramOpenSessionResponse.xml' encoding="UTF-8" NOBOM;

75 The SAS System                                                                                                                                                                                                                   14:45 Monday, April 27, 201

5

WARNING: End of file.

WARNING: End of file.

Super Contributor
Posts: 336

Re: Using Start & End Group Nodes for Boosting & Bagging

Hey Josh,

Glad you are giving the Start/End Group nodes a try!

I could not find an error in your log.

Let's troubleshoot a bit. Try the same flows as in page 3 with HMEQ data set.

To generate HMEQ go to Help->Generate Sample Data->Home Equity.

Then create a flow like Start Groups(Mode=Bagging)->Tree->End Groups.

Does that work now?

99% of the time I forget to specify bagging or boosting as the mode. It might be something else in your case though.

Thanks,

Miguel

New Contributor
Posts: 2

Re: Using Start & End Group Nodes for Boosting & Bagging

Thanks for the reply.  I ran the home equity data and it worked perfectly.  I think the problem was that instead of using a sas data table as my data source, I was using the file import node.  I went back and turned my csv data file into a sas data table in enterprise guide, then imported into EM and it works fine.  Why is this a problem?

Super Contributor
Posts: 336

Re: Using Start & End Group Nodes for Boosting & Bagging

not sure why would that be a problem. I will give it a try with the File Import node and keep you posted.

good luck with your ensemble models!

Occasional Contributor
Posts: 8

Re: Using Start & End Group Nodes for Boosting & Bagging

I had a similar experience. Using the File Import node with a CSV file failed. So I opened the CSV file with Excel, set the format of each column appropriately, and imported it to our SAS server. Using the Data Input node, bagging worked as advertised. Fantastic!

Super Contributor
Posts: 336

Re: Using Start & End Group Nodes for Boosting & Bagging

JPW,

Also we should be looking at the log of the node that errored out.

Right click on that node and click on Results. Once they open go to View->SAS Results->Log.

Thanks,

SAS Employee
Posts: 122

Re: Using Start & End Group Nodes for Boosting & Bagging

Hi,

This SAS TS note appears to suggest meta issue when using file import node directly.

http://support.sas.com/kb/55/675.html
"
Problem Note 55675: The Start Groups node gives a "file ... does not exist" error when bagging or boosting if your flow contains a File Import node"
Ask a Question
Discussion stats
  • 6 replies
  • 920 views
  • 7 likes
  • 4 in conversation