Hi @mh2t,
Are you using SAS Studio to develop your code? If so, then I suggest that you take a look at the tasks (specifically, the Partitioning, Gradient Boosting, and Assess tasks) because they can expedite your code development.
As @StatDave mentioned, a convenient way to organize your data is to have one data table with an indicator variable that denotes which partition an observation belongs to. One benefit to this approach is that, when you estimate your model and use the PARTITION statement, some performance metrics for the validation and test partitions are automatically calculated so you don't need to calculate them separately as an additional step.
For example, the following code creates a CAS session, loads SASHELP.CARS as an in-memory table, and partitions that table into three sets (the PROC PARTITION code is from the Partitioning task):
/* Connect to CAS */
cas;
libname mylib cas caslib="casuser";
/* Load data into memory */
data mylib.cars;
set sashelp.cars;
run;
/* Partition data set */
proc partition data=mylib.cars partind samppct=30 samppct2=10;
output out=mylib.cars;
run;
Now the data table MYLIB.CARS has a new _PartInd_ column where 0 corresponds to the training set, 1 for validation, and 2 for test.
You can then use this data table with the PARTITION statement in PROC GRADBOOST, as is done with the following code (generated by the Gradient Boosting task):
proc gradboost data=MYLIB.CARS outmodel=mylib.savedModel;
partition role=_PartInd_ (validate='1' test='2' train='0');
target Origin / level=nominal;
input MSRP EngineSize / level=interval;
input DriveTrain / level=nominal;
ods output FitStatistics=work.Gradboost_fit;
score out=mylib.scored copyvars=(Origin MSRP EngineSize DriveTrain _PartInd_);
run;
You can see in the results that the procedure automatically calculates fit statistics for all three partitions:
You could also use the saved model (mylib.savedModel) and PROC GRADBOOST to score the validation set, like in the following code:
proc gradboost data=MYLIB.CARS(where=(_partind_=1)) inmodel=mylib.savedModel;
output out=mylib.valscored copyvars=(_all_);
run;
And you can see that the fit statistics match those produced by PROC GRADBOOST for the validation set when you estimated the model (compare with the previous results):
But again, by organizing your data partitions into the same table and by using the PARTITION statement, SAS automatically calculates these fit statistics when you estimate your model. You can also use the scored data table (mylib.scored) with the Assess task for additional model assessment.
Does this help?
-Brian
... View more