When I run PROC HPSPLIT code on local EG vs. on a server (SASApp) I get different results. I also ran proc product_status and the have same SAS packages both local (EG) and on server for both SAS/STAT and High Performance Suite. Below is the code and attached are the outputs from HPSPLIT from both runs:
proc hpsplit data= panel_Consumer_surveyselect LEAFSIZE= 6792 MAXBRANCH=3 ASSIGNMISSING = branch nodes splitonce;
class MSA_US_Ind Fixed_Ind Coll_Used_Vehicle_Ind Coll_Unsecured_Ind CoSign_Ind;
title "Decision Tree Max Leaves = 12 Branch = 3";
model default_flag = Orig_FICO_Score Term MSA_US_Ind Fixed_Ind Coll_Used_Vehicle_Ind Coll_Unsecured_Ind CoSign_Ind;
partition rolevar=selected(TRAIN='1' VALIDATE='0');
prune costcomplexity(leaves=12);
run;
Local SAS EG Run
Server Run
Your help is much appreciated.
It looks like your input data set was generated by using PROC SURVEYSELECT to divide the data into training and validation data. Make sure that you used the SEED= option so that the test and validation subsets are EXACTLY the same for each run.
In general, I would advise that you verify that the input data sets are equal. You can use programming and PROC COMPARE to automate this, or you can just use PROC PRINT (OBS=20) to visually compare the first few values of the input data sets.
I suppose it is also possible that the remote server is running a different version of SAS than the local server, so check the results of %PUT &=SYSVLONG4;
Thank you, Rick.
Yes the seed option is used which is part of same macro executed in both environments. Please see the snippet below:
proc surveyselect data= panel_&portfolio. out=panel_&portfolio._surveyselect seed=123
samprate=0.6 method= srs outall; *** select 60% sample data with simple random sampling for testing ***;
run;
The input data is same (static) in both the instances.
Below are the results of %PUT &=SYSVLONG4;
Local : SYSVLONG4=9.04.01M6P11152018
Server : SYSVLONG4=9.04.01M6P11072018
There seems to be a difference of versions. Please let me know your thoughts on remediation.
Thank you for your inputs
I don't know what the difference in the SYSVLONG4 means. Tech Support could tell you.
The only other idea I have is a host difference or a difference in "bitness". Could it be that the remote server is running Linux whereas the local server is running Windows? Or the remote server is running 64-bit SAS whereas the local server is running 32-bit? Try this command:
%put &=SYSSCP, &=SYSSCPL;
I think it might be worth looking more into Rick's comment about checking if the data sets are exactly the same in the two runs. In the output you attached the two "Node Information" tables report different "Pred" values for the Training and Validation data for the Root Node (0.00191 and 0.00195 versus 0.00200 and 0.00182). I believe that would indicate that the two runs of PROC HPSPLIT are working with different training and validation partitions leading to the different results.
In terms of why the data sets might be different, in addition to (or in conjunction with) possible "bitness" and host differences, the ordering of the input data provided to PROC SURVEYSELECT will effect the simple random sample even if the same seed is provided (see below).
data exData;
do i=1 to 1000;
x = rand("uniform");
output;
end;
run;
/* Sampling for data ordered by i */
proc surveyselect data=exData out=exData_byi
seed=123 samprate=0.6 method=srs outall;
run;
proc sort data=exData;
by x;
run;
/* Sampling for data ordered by x */
proc surveyselect data=exData out=exData_byx
seed=123 samprate=0.6 method=srs outall;
run;
/* Different Samples */
proc compare data=exData_byi c=exData_byx;
run;
proc sort data=exData;
by i;
run;
/* Sampling for data ordered by i again */
proc surveyselect data=exData out=exData_byi2
seed=123 samprate=0.6 method=srs outall;
run;
/* Same Samples */
proc compare data=exData_byi c=exData_byi2;
run;
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.