<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Proc HPForest in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/508993#M7470</link>
    <description>&lt;P&gt;Hi Guys&lt;/P&gt;&lt;P&gt;i'm new to SAS programming and currently i am using SAS Studio ( not Viyay) .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is , how do i split train and validate data using the codes below?&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output results from this scripts only show training data portion and not validate portion. Kindly guide me on this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS Code:&amp;nbsp;&lt;/P&gt;&lt;P&gt;%MACRO HPFOREST(VARS=);&lt;BR /&gt;PROC HPFOREST DATA=WORK.ABC&lt;BR /&gt;MAXTREES=500&lt;BR /&gt;VARS_TO_TRY=&amp;amp;VARS.;&lt;BR /&gt;&lt;BR /&gt;TARGET AAA / LEVEL= BINARY;&lt;BR /&gt;INPUT AA BB CC DD&amp;nbsp; /LEVEL=INTERVAL;&lt;BR /&gt;ods output FitStatistics = fitstats_vars&amp;amp;Vars.(rename=(Miscoob=VarsToTry&amp;amp;Vars.));&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;%mend;&lt;BR /&gt;%hpforest(vars=all);&lt;BR /&gt;%hpforest(vars=40);&lt;BR /&gt;%hpforest(vars=26);&lt;BR /&gt;%hpforest(vars=7);&lt;BR /&gt;%hpforest(vars=2);&lt;BR /&gt;&lt;BR /&gt;data fitstats;&lt;BR /&gt;merge&lt;BR /&gt;fitstats_varsall&lt;BR /&gt;fitstats_vars40&lt;BR /&gt;fitstats_vars26&lt;BR /&gt;fitstats_vars7&lt;BR /&gt;fitstats_vars2;&lt;BR /&gt;rename Ntrees=Trees;&lt;BR /&gt;label VarsToTryAll = "Vars=All";&lt;BR /&gt;label VarsToTry40 = "Vars=40";&lt;BR /&gt;label VarsToTry26 = "Vars=26";&lt;BR /&gt;label VarsToTry7 = "Vars=7";&lt;BR /&gt;label VarsToTry2 = "Vars=2";&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;/*PLOT MISCLASSIFICATION RATE VS VARIABLE TRY*/;&lt;BR /&gt;proc sgplot data=fitstats;&lt;BR /&gt;title "Misclassification Rate for Various VarsToTry Values";&lt;BR /&gt;series x=Trees y = VarsToTryAll/lineattrs=(Color=black);&lt;BR /&gt;series x=Trees y=VarsToTry40/lineattrs=(Pattern=ShortDash Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry26/lineattrs=(Pattern=ShortDash Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry7/lineattrs=(Pattern=MediumDashDotDot Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry2/lineattrs=(Pattern=LongDash Thickness=2);&lt;BR /&gt;yaxis label='OOB Misclassification Rate';&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 31 Oct 2018 02:45:27 GMT</pubDate>
    <dc:creator>jwong7</dc:creator>
    <dc:date>2018-10-31T02:45:27Z</dc:date>
    <item>
      <title>Proc HPForest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/508993#M7470</link>
      <description>&lt;P&gt;Hi Guys&lt;/P&gt;&lt;P&gt;i'm new to SAS programming and currently i am using SAS Studio ( not Viyay) .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is , how do i split train and validate data using the codes below?&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output results from this scripts only show training data portion and not validate portion. Kindly guide me on this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS Code:&amp;nbsp;&lt;/P&gt;&lt;P&gt;%MACRO HPFOREST(VARS=);&lt;BR /&gt;PROC HPFOREST DATA=WORK.ABC&lt;BR /&gt;MAXTREES=500&lt;BR /&gt;VARS_TO_TRY=&amp;amp;VARS.;&lt;BR /&gt;&lt;BR /&gt;TARGET AAA / LEVEL= BINARY;&lt;BR /&gt;INPUT AA BB CC DD&amp;nbsp; /LEVEL=INTERVAL;&lt;BR /&gt;ods output FitStatistics = fitstats_vars&amp;amp;Vars.(rename=(Miscoob=VarsToTry&amp;amp;Vars.));&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;%mend;&lt;BR /&gt;%hpforest(vars=all);&lt;BR /&gt;%hpforest(vars=40);&lt;BR /&gt;%hpforest(vars=26);&lt;BR /&gt;%hpforest(vars=7);&lt;BR /&gt;%hpforest(vars=2);&lt;BR /&gt;&lt;BR /&gt;data fitstats;&lt;BR /&gt;merge&lt;BR /&gt;fitstats_varsall&lt;BR /&gt;fitstats_vars40&lt;BR /&gt;fitstats_vars26&lt;BR /&gt;fitstats_vars7&lt;BR /&gt;fitstats_vars2;&lt;BR /&gt;rename Ntrees=Trees;&lt;BR /&gt;label VarsToTryAll = "Vars=All";&lt;BR /&gt;label VarsToTry40 = "Vars=40";&lt;BR /&gt;label VarsToTry26 = "Vars=26";&lt;BR /&gt;label VarsToTry7 = "Vars=7";&lt;BR /&gt;label VarsToTry2 = "Vars=2";&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;/*PLOT MISCLASSIFICATION RATE VS VARIABLE TRY*/;&lt;BR /&gt;proc sgplot data=fitstats;&lt;BR /&gt;title "Misclassification Rate for Various VarsToTry Values";&lt;BR /&gt;series x=Trees y = VarsToTryAll/lineattrs=(Color=black);&lt;BR /&gt;series x=Trees y=VarsToTry40/lineattrs=(Pattern=ShortDash Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry26/lineattrs=(Pattern=ShortDash Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry7/lineattrs=(Pattern=MediumDashDotDot Thickness=2);&lt;BR /&gt;series x=Trees y=VarsToTry2/lineattrs=(Pattern=LongDash Thickness=2);&lt;BR /&gt;yaxis label='OOB Misclassification Rate';&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Oct 2018 02:45:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/508993#M7470</guid>
      <dc:creator>jwong7</dc:creator>
      <dc:date>2018-10-31T02:45:27Z</dc:date>
    </item>
    <item>
      <title>Re: Proc HPForest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/509089#M7471</link>
      <description>&lt;P&gt;Do you have a partition variable in your data?&amp;nbsp; If so, you would use the PARTITION statement in PROC HPFOREST with the values that indicate which partition (here, 1 is for training, 0 for validation):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;partition rolevar= &lt;EM&gt;&lt;STRONG&gt;your_partition_var&lt;/STRONG&gt;&amp;nbsp;&lt;/EM&gt;(TRAIN='1' VALIDATE='0');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you don't already have a partition variable, you can use PROC HPSAMPLE with the PARTITION option to create one:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/prochp/68141/HTML/default/viewer.htm#prochp_hpsample_overview.htm" target="_self"&gt;http://support.sas.com/documentation/cdl/en/prochp/68141/HTML/default/viewer.htm#prochp_hpsample_overview.htm&lt;/A&gt;.&amp;nbsp; Then use the PARTITION statement in PROC HPFOREST as above.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Oct 2018 13:03:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/509089#M7471</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2018-10-31T13:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Proc HPForest</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/510958#M7480</link>
      <description>&lt;P&gt;Dear Wendy&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your great help. Yes! i am able to use the HPSample to split my data for the HPForest procedure as below:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;BR /&gt;proc hpsample data=&amp;amp;prepped_data. out=hpforest.split sampobs=640 seed=1234567 partition;&lt;BR /&gt; Class &amp;amp;target. ;&lt;BR /&gt; var &amp;amp;interval_inputs. ;&lt;BR /&gt;run;&lt;BR /&gt;Proc Freq data=hpforest.split;&lt;BR /&gt;run; &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;PROC HPFOREST DATA=&amp;amp;prepped_data1. &lt;BR /&gt;MAXTREES= 200 &lt;BR /&gt;VARS_TO_TRY=5 &lt;BR /&gt;seed=600 &lt;BR /&gt;trainfraction=0.6 &lt;BR /&gt;maxdepth=50 &lt;BR /&gt;leafsize=6 &lt;BR /&gt;alpha= 0.1; &lt;BR /&gt; &lt;BR /&gt;TARGET &amp;amp;target. / LEVEL=Nominal; &lt;BR /&gt;INPUT &amp;amp;interval_inputs. /LEVEL=INTERVAL; &lt;BR /&gt;Partition roleVar=_partind_(train='1' validate='0'); &lt;BR /&gt;ODS OUTPUT VARIABLEIMPORTANCE=LOSS_REDUCTION_IMPORTANCE; &lt;BR /&gt;ODS OUTPUT FITSTATISTICS=FIR_STATISTICS; &lt;BR /&gt;save file="&amp;amp;outdir/CTGF.sas";&lt;BR /&gt; &lt;BR /&gt;RUN; &lt;BR /&gt; &lt;BR /&gt;/* COMMAND BELOW ENABLE YOU TO PRINT SPECIFIC OUTPUT RESULTS.*/; &lt;BR /&gt;PROC PRINT DATA=WORK.FIR_STATISTICS; &lt;BR /&gt;PROC PRINT DATA=WORK.LOSS_REDUCTION_IMPORTANCE; &lt;BR /&gt;run; &lt;BR /&gt;&lt;BR /&gt;Allow me to ask further questions on HP Forest.&lt;BR /&gt;&lt;BR /&gt;Q1. How can i use the partition to split the LOSS_REDUCTION_IMPORTANCE into Train &amp;amp; Validate graphs ?&lt;BR /&gt;Q2. How can i score the HPForest model? i could not find the right information for me to do so. &lt;BR /&gt;&lt;BR /&gt;Kindly guide me on this.&lt;BR /&gt;&lt;BR /&gt;Many Thanks! &lt;BR /&gt;Jimmy &lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Nov 2018 06:27:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Proc-HPForest/m-p/510958#M7480</guid>
      <dc:creator>jwong7</dc:creator>
      <dc:date>2018-11-07T06:27:22Z</dc:date>
    </item>
  </channel>
</rss>

