My previous tip on cross validation shows how to compare three trained models (regression, random forest, and gradient boosting) based on their 5-fold cross validation training errors in SAS Enterprise Miner. This tip is the second installment about using cross validation in SAS Enterprise Miner and builds on the diagram that is used in the first tip.
In addition to comparing models based on their 5-fold cross validation training errors, this tip also shows how to obtain a 5-fold cross validation testing error; so it provides a more complete SAS Enterprise Miner flow (shown below).
First a quick note about how k-fold cross validation training and testing errors are calculated:
Following is a step-by-step explanation of the preceding Enterprise Miner flow. You can run this process flow by using the attached xml file.
data temptest;
set &EM_import_TEST;
ID = _N_;
run;
data &EM_EXPORT_TEST;
set temptest (in=in1) temptest (in=in2) temptest (in=in3) temptest (in=in4) temptest(in=in5);
if in1 then _fold_= 1;
else if in2 then _fold_=2;
else if in3 then _fold_=3;
else if in4 then _fold_=4;
else if in5 then _fold_=5;
run;
proc sort data=&EM_IMPORT_TEST out=&EM_EXPORT_TEST;
by ID;
run;
data &EM_EXPORT_TRAIN;
set &EM_IMPORT_DATA;
run;
data test1 test2 test3 test4 test5;
set &EM_IMPORT_TEST;
if _fold_ = 1 then output test1;
if _fold_ = 2 then output test2;
if _fold_ = 3 then output test3;
if _fold_ = 4 then output test4;
if _fold_ = 5 then output test5;
run;
data &EM_EXPORT_TEST;
merge test1(rename=(P_BAD1 = P_BAD_1)) test2(rename=(P_BAD1 =P_BAD_2))
test3(rename=(P_BAD1 = P_BAD_3)) test4(rename=(P_BAD1 = P_BAD_4))
test5(rename=(P_BAD1 = P_BAD_5));
by ID;
P_BAD1 = (P_BAD_1 + P_BAD_2 + P_BAD_3 + P_BAD_4+P_BAD_5)/5;
run;
Note that you can obtain the cross validated predictions of the test set by saving the exported data (&EM_EXPORT_TEST) of the SAS Code 2 node.
If you run this flow diagram or replicate this analysis for your own data, make sure that you run each Start Groups/End Groups block separately, because multiple looping actions do not work at the same time.
Thanks a lot to Ralph Abbey for his help in putting this together.
Hi @Funda_SAS, thanks for the great article. I've tried implementing the flow diagram but seem to be facing an issue, namely:
the ID variable generated in the bottom code snippet doesn't seem to be parsed to the start group nodes, resulting in an error in SAS CODE NODE 2 when trying to sort the imported test data set by ID.
data temptest;
set &EM_import_TEST;
ID = _N_;
run;
Would appreciate any help on the aforementioned issue. Thanks!
Hi @abelroch ,
If you are directly using the attached xml file, try creating your own data sources. Also, make sure the Input Data Node has the Test role, by default Role property is set to Raw.
Hope this helps!
Funda
Can this approach be used for HP Forest when optimization is used?
I was reading the Enterprise Miner reference, and noticed the following:
What is stated in the documentation is true. The cross validation mode in Start and End Groups nodes does not support the HP Forest node mainly because scoring for HP FOREST and HP SVM is done differently than other HP nodes.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.