BookmarkSubscribeRSS Feed
jwong7
Calcite | Level 5

Hi Guys

i'm new to SAS programming and currently i am using SAS Studio ( not Viyay) .

 

My question is , how do i split train and validate data using the codes below? 

The output results from this scripts only show training data portion and not validate portion. Kindly guide me on this. 

Many thanks!

 

SAS Code: 

%MACRO HPFOREST(VARS=);
PROC HPFOREST DATA=WORK.ABC
MAXTREES=500
VARS_TO_TRY=&VARS.;

TARGET AAA / LEVEL= BINARY;
INPUT AA BB CC DD  /LEVEL=INTERVAL;
ods output FitStatistics = fitstats_vars&Vars.(rename=(Miscoob=VarsToTry&Vars.));
run;

%mend;
%hpforest(vars=all);
%hpforest(vars=40);
%hpforest(vars=26);
%hpforest(vars=7);
%hpforest(vars=2);

data fitstats;
merge
fitstats_varsall
fitstats_vars40
fitstats_vars26
fitstats_vars7
fitstats_vars2;
rename Ntrees=Trees;
label VarsToTryAll = "Vars=All";
label VarsToTry40 = "Vars=40";
label VarsToTry26 = "Vars=26";
label VarsToTry7 = "Vars=7";
label VarsToTry2 = "Vars=2";
run;

/*PLOT MISCLASSIFICATION RATE VS VARIABLE TRY*/;
proc sgplot data=fitstats;
title "Misclassification Rate for Various VarsToTry Values";
series x=Trees y = VarsToTryAll/lineattrs=(Color=black);
series x=Trees y=VarsToTry40/lineattrs=(Pattern=ShortDash Thickness=2);
series x=Trees y=VarsToTry26/lineattrs=(Pattern=ShortDash Thickness=2);
series x=Trees y=VarsToTry7/lineattrs=(Pattern=MediumDashDotDot Thickness=2);
series x=Trees y=VarsToTry2/lineattrs=(Pattern=LongDash Thickness=2);
yaxis label='OOB Misclassification Rate';
run;

 

 

2 REPLIES 2
WendyCzika
SAS Employee

Do you have a partition variable in your data?  If so, you would use the PARTITION statement in PROC HPFOREST with the values that indicate which partition (here, 1 is for training, 0 for validation):

 

 partition rolevar= your_partition_var (TRAIN='1' VALIDATE='0');

 

If you don't already have a partition variable, you can use PROC HPSAMPLE with the PARTITION option to create one:

http://support.sas.com/documentation/cdl/en/prochp/68141/HTML/default/viewer.htm#prochp_hpsample_ove....  Then use the PARTITION statement in PROC HPFOREST as above.

jwong7
Calcite | Level 5

Dear Wendy 

Thank you for your great help. Yes! i am able to use the HPSample to split my data for the HPForest procedure as below: 

 



proc hpsample data=&prepped_data. out=hpforest.split sampobs=640 seed=1234567 partition;
Class &target. ;
var &interval_inputs. ;
run;
Proc Freq data=hpforest.split;
run;


PROC HPFOREST DATA=&prepped_data1.
MAXTREES= 200
VARS_TO_TRY=5
seed=600
trainfraction=0.6
maxdepth=50
leafsize=6
alpha= 0.1;

TARGET &target. / LEVEL=Nominal;
INPUT &interval_inputs. /LEVEL=INTERVAL;
Partition roleVar=_partind_(train='1' validate='0');
ODS OUTPUT VARIABLEIMPORTANCE=LOSS_REDUCTION_IMPORTANCE;
ODS OUTPUT FITSTATISTICS=FIR_STATISTICS;
save file="&outdir/CTGF.sas";

RUN;

/* COMMAND BELOW ENABLE YOU TO PRINT SPECIFIC OUTPUT RESULTS.*/;
PROC PRINT DATA=WORK.FIR_STATISTICS;
PROC PRINT DATA=WORK.LOSS_REDUCTION_IMPORTANCE;
run;

Allow me to ask further questions on HP Forest.

Q1. How can i use the partition to split the LOSS_REDUCTION_IMPORTANCE into Train & Validate graphs ?
Q2. How can i score the HPForest model? i could not find the right information for me to do so.

Kindly guide me on this.

Many Thanks!
Jimmy

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1985 views
  • 0 likes
  • 2 in conversation