<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to use validate and test datasets manually in PROC GRADBOOST? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668074#M8352</link>
    <description>&lt;P&gt;I did split my dataset into 3 separate sas datasets train, validate and test.&lt;/P&gt;
&lt;P&gt;I wanted to build a GBM model on the train set, check on the validate set and predict on the test set. How can I use the &lt;STRONG&gt;validate&lt;/STRONG&gt; and &lt;STRONG&gt;test&lt;/STRONG&gt; sets in my code explicitly for model checking and prediction?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc gradboost data=mylib.train outmodel=mylib.savedModel seed=12345;
   input &amp;amp;myVars / level = nominal;
   target Y/ level = nominal;
   ods output FitStatistics=fitstats;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Your help would be greatly appreciated!&lt;/P&gt;</description>
    <pubDate>Thu, 09 Jul 2020 15:43:20 GMT</pubDate>
    <dc:creator>mh2t</dc:creator>
    <dc:date>2020-07-09T15:43:20Z</dc:date>
    <item>
      <title>How to use validate and test datasets manually in PROC GRADBOOST?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668074#M8352</link>
      <description>&lt;P&gt;I did split my dataset into 3 separate sas datasets train, validate and test.&lt;/P&gt;
&lt;P&gt;I wanted to build a GBM model on the train set, check on the validate set and predict on the test set. How can I use the &lt;STRONG&gt;validate&lt;/STRONG&gt; and &lt;STRONG&gt;test&lt;/STRONG&gt; sets in my code explicitly for model checking and prediction?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc gradboost data=mylib.train outmodel=mylib.savedModel seed=12345;
   input &amp;amp;myVars / level = nominal;
   target Y/ level = nominal;
   ods output FitStatistics=fitstats;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Your help would be greatly appreciated!&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jul 2020 15:43:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668074#M8352</guid>
      <dc:creator>mh2t</dc:creator>
      <dc:date>2020-07-09T15:43:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to use validate and test datasets manually in PROC GRADBOOST?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668081#M8353</link>
      <description>&lt;P&gt;Concatenate your separate data sets into one data set with an added variable that has a distinct value for the training, validation, and testing sets of observations. Then add a PARTITION statement in your PROC GRADBOOST step. For example, if the added variable is named ObsType with values "trn", "val", and "tst":&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;partition role=ObsType(train='trt' validate='val' test='tst');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;See the documentation for details on this statement.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jul 2020 15:51:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668081#M8353</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2020-07-09T15:51:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to use validate and test datasets manually in PROC GRADBOOST?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668181#M8354</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/160501"&gt;@mh2t&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using SAS Studio to develop your code? If so, then I suggest that you take a look at the tasks (specifically, the &lt;A href="https://go.documentation.sas.com/?activeCdc=webeditorcdc&amp;amp;cdcId=sasstudiocdc&amp;amp;cdcVersion=5.2&amp;amp;docsetId=webeditorref&amp;amp;docsetTarget=n0ds8g9ukjksn7n1rste12p7k9os.htm&amp;amp;locale=en&amp;amp;docsetVersion=5.2" target="_self"&gt;Partitioning&lt;/A&gt;, &lt;A href="https://go.documentation.sas.com/?activeCdc=webeditorcdc&amp;amp;cdcId=sasstudiocdc&amp;amp;cdcVersion=5.2&amp;amp;docsetId=webeditorref&amp;amp;docsetTarget=n15kxu82udmrj3n1ms2tjnnd868d.htm&amp;amp;locale=en" target="_self"&gt;Gradient Boosting&lt;/A&gt;, and &lt;A href="https://go.documentation.sas.com/?activeCdc=webeditorcdc&amp;amp;cdcId=sasstudiocdc&amp;amp;cdcVersion=5.2&amp;amp;docsetId=webeditorref&amp;amp;docsetTarget=p09ep11ax4m4lcn1o5fljfb7dsxn.htm&amp;amp;locale=en&amp;amp;docsetVersion=5.2" target="_self"&gt;Assess&lt;/A&gt; tasks) because they can expedite your code development.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13633"&gt;@StatDave&lt;/a&gt;&amp;nbsp;mentioned, a convenient way to organize your data is to have one data table with an indicator variable that denotes which partition an observation belongs to. One benefit to this approach is that, when you estimate your model and use the PARTITION statement, some performance metrics for the validation and test partitions are automatically calculated so you don't need to calculate them separately as an additional step.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, the following code creates a CAS session, loads SASHELP.CARS as an in-memory table, and partitions that table into three sets (the PROC PARTITION code is from the Partitioning task):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Connect to CAS */
cas;
libname mylib cas caslib="casuser";

/* Load data into memory */
data mylib.cars; 
   set sashelp.cars; 
run;

/* Partition data set */
proc partition data=mylib.cars partind samppct=30 samppct2=10;
	output out=mylib.cars;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now the data table MYLIB.CARS has a new _PartInd_ column where 0 corresponds to the training set, 1 for validation, and 2 for test.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can then use this data table with the PARTITION statement in PROC GRADBOOST, as is done with the following code (generated by the Gradient Boosting task):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc gradboost data=MYLIB.CARS outmodel=mylib.savedModel;
	partition role=_PartInd_ (validate='1' test='2' train='0');
	target Origin / level=nominal;
	input MSRP EngineSize / level=interval;
	input DriveTrain / level=nominal;
	ods output FitStatistics=work.Gradboost_fit;
	score out=mylib.scored copyvars=(Origin MSRP EngineSize DriveTrain _PartInd_);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You can see in the results that the procedure automatically calculates fit statistics for all three partitions:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="gradboostResults.PNG" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/47046iE07937AA9DA1DD3E/image-size/large?v=v2&amp;amp;px=999" role="button" title="gradboostResults.PNG" alt="gradboostResults.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could also use the saved model (mylib.savedModel) and PROC GRADBOOST to score the validation set, like in the following code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc gradboost data=MYLIB.CARS(where=(_partind_=1)) inmodel=mylib.savedModel;
	output out=mylib.valscored copyvars=(_all_);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And you can see that the fit statistics match those produced by PROC GRADBOOST for the validation set when you estimated the model (compare with the previous results):&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="gradboostValResultsInmodel.PNG" style="width: 412px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/47047i82591114B06DDBDF/image-size/large?v=v2&amp;amp;px=999" role="button" title="gradboostValResultsInmodel.PNG" alt="gradboostValResultsInmodel.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But again, by organizing your data partitions into the same table and by using the PARTITION statement, SAS automatically calculates these fit statistics when you estimate your model.&amp;nbsp; You can also use the scored data table (mylib.scored) with the Assess task for additional model assessment.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does this help?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;-Brian&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jul 2020 20:28:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-use-validate-and-test-datasets-manually-in-PROC-GRADBOOST/m-p/668181#M8354</guid>
      <dc:creator>BrianGaines</dc:creator>
      <dc:date>2020-07-09T20:28:58Z</dc:date>
    </item>
  </channel>
</rss>

