I have a project that I am trying to classify a rare event and so would like to experiment with boosting and bagging following the example laid out in this article, https://support.sas.com/resources/papers/proceedings14/SAS133-2014.pdf
The problem is that when I set up my project diagram in a similar manner as on page 3 and try to run the process flow I get a run time error as soon as EM tries to process the first start group node. I have posted the output from the log file below. Any help would be greatly appreciated.
20588 %let SYSCC=0;
20589 %let SYSRC=0;
20590 %let EMEXCEPTIONSTRING=;
20591 %let SYSMSG=;
20592 %em_diagram(action=setproperties, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, requestFile=DiagramSetPropertiesRequest.xml);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20593 +%macro em_dsep;
20594 + %global emdsep;
20595 + %if %substr(&sysscp, 1, 3)= WIN %then
20596 + %let emdsep=\;
20597 + %else
20598 + %if %substr(&sysscp, 1, 3)= DNT %then
20599 + %let emdsep=\;
20600 + %else
20601 + %let emdsep=/;
20602 +%mend em_dsep;
20603 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: ADDNODE
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
20604 %global EM_REFRESH_PROPERTY;
20605 %let EM_REFRESH_PROPERTY=N;
20606 data _null_;
20607 set EMWS6.EM_NODEID;
20608 where upcase(NODEID)=upcase("EndGrp2");
20609 call symput('_EMCLASS', CLASS);
20610 run;
NOTE: There were 1 observations read from the data set EMWS6.EM_NODEID.
WHERE UPCASE(NODEID)='ENDGRP2';
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20612 %let syscc=0;
20613 data WORK._EMVISUALPROPERTIES;
20614 length NODEID CLASS COMPONENT $32 X Y 8 LABEL $81;
20615 NODEID = "EndGrp2";
20616 CLASS='';
20617 component='';
20618 X = 698;
20619 Y= 183;
20620 LABEL = "End Groups (2)";
20621 output;
20622 run;
NOTE: The data set WORK._EMVISUALPROPERTIES has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
20623 proc sort data=WORK._EMVISUALPROPERTIES;
20624 by NODEID;
20625 run;
NOTE: There were 1 observations read from the data set WORK._EMVISUALPROPERTIES.
NOTE: The data set WORK._EMVISUALPROPERTIES has 1 observations and 6 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20626 proc sort data=EMWS6.em_nodeid out=_tempNodeid;
20627 by NODEID;
20628 run;
NOTE: There were 8 observations read from the data set EMWS6.EM_NODEID.
NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20629 data _tempNodeid;
20630 update _tempNodeid(in=_a) WORK._EMVISUALPROPERTIES(in=_b);
20631 by NODEID;
20632 if _a then output;
20633 run;
NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.
NOTE: There were 1 observations read from the data set WORK._EMVISUALPROPERTIES.
NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
20634 data EMWS6.em_nodeid;
20635 set _tempNodeid;
20636 run;
NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.
NOTE: The data set EMWS6.EM_NODEID has 8 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20637 proc datasets lib=WORK nolist;
20638 delete _tempNodeid _EMVISUALPROPERTIES;
20639 run;
NOTE: Deleting WORK._TEMPNODEID (memtype=DATA).
NOTE: Deleting WORK._EMVISUALPROPERTIES (memtype=DATA).
20640 quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
70 The SAS System 14:45 Monday, April 27, 201
5
20641 %let SYSCC=0;
20642 %let SYSRC=0;
20643 %let EMEXCEPTIONSTRING=;
20644 %let SYSMSG=;
20645 %em_diagram(action=connectnode, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, from=Tree2, to=EndGrp2);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20646 +%macro em_dsep;
20647 + %global emdsep;
20648 + %if %substr(&sysscp, 1, 3)= WIN %then
20649 + %let emdsep=\;
20650 + %else
20651 + %if %substr(&sysscp, 1, 3)= DNT %then
20652 + %let emdsep=\;
20653 + %else
20654 + %let emdsep=/;
20655 +%mend em_dsep;
20656 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: SETPROPERTIES
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
20657 data EMWS6.EM_DGRAPH;
20658 set EMWS6.EM_DGRAPH;
20659 where ^(FROM = "Tree2" and TO = "");
20660 run;
NOTE: There were 8 observations read from the data set EMWS6.EM_DGRAPH.
WHERE (FROM not = 'Tree2') or (TO not = ' ');
NOTE: The data set EMWS6.EM_DGRAPH has 8 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
71 The SAS System 14:45 Monday, April 27, 201
5
20661 %let SYSCC=0;
20662 %let SYSRC=0;
20663 %let EMEXCEPTIONSTRING=;
20664 %let SYSMSG=;
20665 %em_diagram(action=connectnode, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, from=EndGrp2, to=MdlComp);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20666 +%macro em_dsep;
20667 + %global emdsep;
20668 + %if %substr(&sysscp, 1, 3)= WIN %then
20669 + %let emdsep=\;
20670 + %else
20671 + %if %substr(&sysscp, 1, 3)= DNT %then
20672 + %let emdsep=\;
20673 + %else
20674 + %let emdsep=/;
20675 +%mend em_dsep;
20676 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: CONNECTNODE
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
20677 data EMWS6.EM_DGRAPH;
20678 set EMWS6.EM_DGRAPH;
20679 where ^(FROM = "EndGrp2" and TO = "");
20680 run;
NOTE: There were 8 observations read from the data set EMWS6.EM_DGRAPH.
WHERE (FROM not = 'EndGrp2') or (TO not = ' ');
NOTE: The data set EMWS6.EM_DGRAPH has 8 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
72 The SAS System 14:45 Monday, April 27, 201
5
20681 %let SYSCC=0;
20682 %let SYSRC=0;
20683 %let EMEXCEPTIONSTRING=;
20684 %let SYSMSG=;
20685 %em_diagram(action=setproperties, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, requestFile=DiagramSetPropertiesRequest.xml);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20686 +%macro em_dsep;
20687 + %global emdsep;
20688 + %if %substr(&sysscp, 1, 3)= WIN %then
20689 + %let emdsep=\;
20690 + %else
20691 + %if %substr(&sysscp, 1, 3)= DNT %then
20692 + %let emdsep=\;
20693 + %else
20694 + %let emdsep=/;
20695 +%mend em_dsep;
20696 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: CONNECTNODE
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
20697 %global EM_REFRESH_PROPERTY;
20698 %let EM_REFRESH_PROPERTY=N;
20699 data _null_;
20700 set EMWS6.EM_NODEID;
20701 where upcase(NODEID)=upcase("Grp2");
20702 call symput('_EMCLASS', CLASS);
20703 run;
NOTE: There were 1 observations read from the data set EMWS6.EM_NODEID.
WHERE UPCASE(NODEID)='GRP2';
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20705 %let syscc=0;
20706 data WORK._EMVISUALPROPERTIES;
20707 length NODEID CLASS COMPONENT $32 X Y 8 LABEL $81;
20708 NODEID = "Tree2";
20709 CLASS='';
20710 component='';
20711 X = 456;
20712 Y= 165;
20713 LABEL = "Decision Tree (2)";
20714 output;
20715 NODEID = "FIMPORT";
20716 CLASS='';
20717 component='';
20718 X = 59;
20719 Y= 239;
20720 LABEL = "File Import";
20721 output;
20722 NODEID = "EndGrp";
20723 CLASS='';
20724 component='';
20725 X = 673;
20726 Y= 336;
20727 LABEL = "End Groups";
20728 output;
20729 NODEID = "Tree";
20730 CLASS='';
20731 component='';
20732 X = 477;
20733 Y= 339;
20734 LABEL = "Decision Tree";
20735 output;
20736 NODEID = "EndGrp2";
20737 CLASS='';
20738 component='';
20739 X = 627;
20740 Y= 161;
20741 LABEL = "End Groups (2)";
20742 output;
20743 NODEID = "MdlComp";
20744 CLASS='';
20745 component='';
20746 X = 959;
20747 Y= 240;
20748 LABEL = "Model Comparison";
20749 output;
20750 NODEID = "Grp";
20751 CLASS='';
20752 component='';
20753 X = 262;
20754 Y= 339;
20755 LABEL = "Start Groups (Boosting)";
20756 output;
20757 NODEID = "Grp2";
20758 CLASS='';
20759 component='';
20760 X = 254;
20761 Y= 172;
20762 LABEL = "Start Groups (Bagging)";
20763 output;
20764 run;
NOTE: The data set WORK._EMVISUALPROPERTIES has 8 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
20765 proc sort data=WORK._EMVISUALPROPERTIES;
20766 by NODEID;
20767 run;
NOTE: There were 8 observations read from the data set WORK._EMVISUALPROPERTIES.
NOTE: The data set WORK._EMVISUALPROPERTIES has 8 observations and 6 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20768 proc sort data=EMWS6.em_nodeid out=_tempNodeid;
20769 by NODEID;
20770 run;
NOTE: There were 8 observations read from the data set EMWS6.EM_NODEID.
NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20771 data _tempNodeid;
20772 update _tempNodeid(in=_a) WORK._EMVISUALPROPERTIES(in=_b);
20773 by NODEID;
20774 if _a then output;
20775 run;
NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.
NOTE: There were 8 observations read from the data set WORK._EMVISUALPROPERTIES.
NOTE: The data set WORK._TEMPNODEID has 8 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
20776 data EMWS6.em_nodeid;
20777 set _tempNodeid;
20778 run;
NOTE: There were 8 observations read from the data set WORK._TEMPNODEID.
NOTE: The data set EMWS6.EM_NODEID has 8 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20779 proc datasets lib=WORK nolist;
20780 delete _tempNodeid _EMVISUALPROPERTIES;
20781 run;
NOTE: Deleting WORK._TEMPNODEID (memtype=DATA).
NOTE: Deleting WORK._EMVISUALPROPERTIES (memtype=DATA).
20782 quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
73 The SAS System 14:45 Monday, April 27, 201
5
20783 %let SYSCC=0;
20784 %let SYSRC=0;
20785 %let EMEXCEPTIONSTRING=;
20786 %let SYSMSG=;
20787 %em_diagram(action=closesession, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20788 +%macro em_dsep;
20789 + %global emdsep;
20790 + %if %substr(&sysscp, 1, 3)= WIN %then
20791 + %let emdsep=\;
20792 + %else
20793 + %if %substr(&sysscp, 1, 3)= DNT %then
20794 + %let emdsep=\;
20795 + %else
20796 + %let emdsep=/;
20797 +%mend em_dsep;
20798 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: SETPROPERTIES
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
NOTE: Libref EMWS6 has been deassigned.
74 The SAS System 14:45 Monday, April 27, 201
5
20799 %let SYSCC=0;
20800 %let SYSRC=0;
20801 %let EMEXCEPTIONSTRING=;
20802 %let SYSMSG=;
20803 %em_diagram(action=opensession, projpath=%nrstr(C:\Users\Josh New\Downloads\Practicum Project), projname=%nrstr(practicum_model), dgmId=EMWS6, sessionid=Josh New1430160328326, outfile=DiagramOpenSessionResponse.xml);
NOTE: %INCLUDE (level 1) file _DGMFRF is file SASHELP.EMWIP.EM_DSEP.SOURCE.
20804 +%macro em_dsep;
20805 + %global emdsep;
20806 + %if %substr(&sysscp, 1, 3)= WIN %then
20807 + %let emdsep=\;
20808 + %else
20809 + %if %substr(&sysscp, 1, 3)= DNT %then
20810 + %let emdsep=\;
20811 + %else
20812 + %let emdsep=/;
20813 +%mend em_dsep;
20814 +%em_dsep;
NOTE: %INCLUDE (level 1) ending.
NOTE: Fileref _DGMFRF has been deassigned.
WIP_ACTION: CLOSESESSION
DGMID: EMWS6
LOCKFILE: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\System\wsopen.lck
NOTE: Libref EMWS6 was successfully assigned as follows:
Engine: V9
Physical Name: C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6
NOTE: There were 9 observations read from the data set EMWS6.EM_DGRAPH.
NOTE: The data set WORK.EM_DGRAPH has 9 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
20815 %let syscc=0;
20816 filename _wipchk catalog "EMWS6.EndGrp.test.source";
20817 data _null_;
20818 file _wipchk;
20819 put '/* Test */';
20820 run;
NOTE: The file _WIPCHK is:
Catalog Name=EMWS6.ENDGRP.TEST.SOURCE,
Catalog Page Size=4096,
Number of Catalog Pages=11,
Created=Monday, April 27, 2015 02:47:34 PM,
Last Modified=Monday, April 27, 2015 03:17:32 PM,
Filename=C:\Users\Josh New\Downloads\Practicum Project\practicum_model\Workspaces\EMWS6\endgrp.sas7bcat,
Release Created=9.0301M2,
Host Created=X64_8HOME
NOTE: 1 record was written to the file _WIPCHK.
The minimum record length was 10.
The maximum record length was 10.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20821 data _null_;
20822 rc = fdelete('_wipchk');
20823 run;
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
20824 filename _wipchk;
NOTE: Fileref _WIPCHK has been deassigned.
20825 filename _wipxml 'C:\Users\JOSHNE~1\AppData\Local\Temp\SAS Temporary Files\_TD2700_JOSHMOBILE_\Prc2\DiagramOpenSessionResponse.xml' encoding="UTF-8" NOBOM;
75 The SAS System 14:45 Monday, April 27, 201
5
WARNING: End of file.
WARNING: End of file.
Hey Josh,
Glad you are giving the Start/End Group nodes a try!
I could not find an error in your log.
Let's troubleshoot a bit. Try the same flows as in page 3 with HMEQ data set.
To generate HMEQ go to Help->Generate Sample Data->Home Equity.
Then create a flow like Start Groups(Mode=Bagging)->Tree->End Groups.
Does that work now?
99% of the time I forget to specify bagging or boosting as the mode. It might be something else in your case though.
Thanks,
Miguel
Thanks for the reply. I ran the home equity data and it worked perfectly. I think the problem was that instead of using a sas data table as my data source, I was using the file import node. I went back and turned my csv data file into a sas data table in enterprise guide, then imported into EM and it works fine. Why is this a problem?
not sure why would that be a problem. I will give it a try with the File Import node and keep you posted.
good luck with your ensemble models!
I had a similar experience. Using the File Import node with a CSV file failed. So I opened the CSV file with Excel, set the format of each column appropriately, and imported it to our SAS server. Using the Data Input node, bagging worked as advertised. Fantastic!
JPW,
Also we should be looking at the log of the node that errored out.
Right click on that node and click on Results. Once they open go to View->SAS Results->Log.
Thanks,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.