BookmarkSubscribeRSS Feed
jwong7
Calcite | Level 5

Dear Sir/Madam

I would like to know how can i call out the validate probability value using HP4SCORE procedure. i used the sampsio.HMEQ  data sample to run it on HPForest as below: 

 

HPForest Procedure :

proc hpsample data= sampsio.HMEQ out=hmeq sampobs=1788 seed=1234567 partition;
Class Bad Delinq Derog Job nInq Reason ;
run;

proc hpforest data=hmeq
maxtrees= 500 vars_to_try=4
seed=600 trainfraction=0.7
maxdepth=50 leafsize=6
alpha= 0.05;
target Bad/ level=binary;
input Delinq Derog Job nInq Reason / level=nominal;
Partition roleVar=_partind_(train='1' validate='0') ;
ods output fitstatistics = fit;
save file = "/&outdir./Cars.sas";
run;


proc hp4score data=hmeq;
id Bad;
score file="/&outdir./Cars.sas" out=Score123;
run;
proc print data=Score123 (obs=100);
run;

 

The output of the hp4score procedure only display the P_Bad0 ,PBad1&I_BAD. ( Please refer to the attachment) and there is no Validate dataset probability output for validate.

 

 

HP4Score.jpg

Question #1 :

What does I_Bad means for my HPForest output based on the script above? 

 

Similarity, applying the same data set to  HPSPLIT procedure, the scored data produces the probability value for Train and Validate, please see below: 

 

HPSPLIT.JPG

 

SAS Code used:

data hmeq;
length Bad Loan MortDue Value 8 Reason Job $7
YoJ Derog Delinq CLAge nInq CLNo DebtInc 8;
set sampsio.hmeq;
run;

proc print data=hmeq(obs=10);
run;

/* HPSLIT procedure */
ods graphics on;
proc hpsplit data=hmeq maxdepth=5;
class Bad Delinq Derog Job nInq Reason;
model Bad(event='1') = Delinq Derog Job nInq Reason CLAge CLNo
DebtInc Loan MortDue Value YoJ;
prune costcomplexity;

partition fraction(validate=0.3 seed=1234567);
/* grow gini; */
code file="/&outdir/hpsplexc.sas";
run;

data scored;
set hmeq;
%include '/&outdir/hpsplexc.sas';
run;

/* Sample 100 records to view */
proc print data =WORK.scored (obs=500);
run;

 

 

Question # 2

How and what have i missed in my HPForest coding which cause me not be able to call out the validation probability just like HPSPLIT, the score output has both Train( P_Bad0, P_Bad1) and Validate (V_Bad0 , V_Bad1) .

 

Kindly advice, 

Many thanks for your great help !!

thanks!

Jimmy

5 REPLIES 5
WendyCzika
SAS Employee

I_BAD represents the predicted level of BAD based on the P_ variables.  So if P_BAD0 > P_BAD1, then I_BAD=0 (the prediction is BAD=0), and vice-versa.  

 

When you are scoring data, you don't use validation data, so you wouldn't get the V_ columns like you do when you are training the model.

jwong7
Calcite | Level 5

Thanks for your kind input. May i know how can i use the train dataset to call out the V_ columns ? 

kindly advise. 

 

 

 


@WendyCzika wrote:

I_BAD represents the predicted level of BAD based on the P_ variables.  So if P_BAD0 > P_BAD1, then I_BAD=0 (the prediction is BAD=0), and vice-versa.  

 

When you are scoring data, you don't use validation data, so you wouldn't get the V_ columns like you do when you are training the model.


 

jwong7
Calcite | Level 5

Thanks for your kind input. May i know how can i use the train dataset to call out the V_ columns using HPForest procedure? 

kindly advise. thanks!

WendyCzika
SAS Employee

Sorry, I'm not sure what you are asking.  Can you give some more details?

jwong7
Calcite | Level 5

Dear Wendy 

Sorry i did not make it clear on my objective and questions. Basically i want to use ROC curve to compare between classification models such as  picture below :

 

Model comparison.JPG

 

i got this infor from SAS youtube link  below: 

https://www.youtube.com/watch?v=KMV5OtgTUUc

 

The understanding i have from SAS video above is, the instructor was running each classification model and score them. At the end of it, the instructor used Proc access to combined multiple model and compared them using the ROC chart above. 

 

 

i have tried on HPSplit procedure and managed to score them successfully as below using sampsio.HMEQ sample  the output results containing  the probability value for train and validate dataset like below. 

 

 

HPSPLIT.JPG

 

HPSplit scoring code used: 

 


ods graphics on;
proc hpsplit data=hmeq maxdepth=5;
class Bad Delinq Derog Job nInq Reason;
model Bad(event='1') = Delinq Derog Job nInq Reason CLAge CLNo
DebtInc Loan MortDue Value YoJ;
prune costcomplexity;

partition fraction(validate=0.3 seed=1234567);
code file="/ABC/hpsplexc.sas";
run;

 

data scored;
set hmeq;
%include '/lustre/home/1000151442/hpsplexc.sas';
run;

 

Now ,  I am  trying to simulate the same thing, same dataset using HPForest procedure.  

 

 

proc hpsample data= sampsio.HMEQ
out=hmeq sampobs=1788 seed=1234567 partition;
Class Bad Delinq Derog Job nInq Reason ;
run;

proc hpforest data=work.hmeq
maxtrees= 500 vars_to_try=4
seed=600
maxdepth=50 leafsize=6
alpha= 0.05;
target Bad/ level=binary;
input Delinq Derog Job nInq Reason / level=nominal;
Partition roleVar=_partind_(train='0' validate='1') ;
ods output fitstatistics = fit;
save file = "/&outdir/abc.sas";
run;


proc hp4score data=work.hmeq;
id Bad;
score file="/&outdir/abc.sas" out=Score123;
run;
proc print data=Score123;
run;

 

Output results :

HPForest_Scored.JPG

 

i couldn't get the probablity value for V_BAD0 & V_BAD1 from HPForest like HPSplit above.  

 

Kindly guide me on this. 

 

Many thanks!

 

 

 

 

 

 

 

 

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1504 views
  • 0 likes
  • 2 in conversation