Hey community,
I have a sample that I am running through HPSPIT for a binary (one-split) decision tree. I was planning to run a bunch of bootstrap versions of the set through the procedure and record what the value it is splitting on for the single continuous predictor. I created a reproachable example below.
Issue, PROC HPSPLIT seems not to have a BY statement option. I suppose a work around is using wrap around piece of code that feeds the Replicate value into a where statement. I could probably figure that out, but I am a realist and know it would take me over an hour to get it to work, so I am soliciting you all's help to write it.
Second issue, since you all are very savvy, I was also going to see if you could figure out a way to pull out the split value for each bootstrap sample. The current code just has B=5, but say if I bump it up to 50, could you all help create an automated way to pull out all of the split values and put them into a working SAS file. Let me know if you have questions!!
proc surveyselect data=sashelp.heart out=heart_boot NOPRINT
seed=1
method=urs
samprate=1
outhits
rep = 5;
run;
data heart_boot_trim;
set heart_boot;
keep Replicate status ageatstart;
run;
ods graphics on;
ods trace on;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
/* by Replicate; */ /*doesn't work*/
class Status;
model Status (event='Dead') = AgeAtStart;
prune costcomplexity;
partition fraction(validate=0.3 seed=1234);
rules file='rules.txt';
run;
ods graphics off;
ods trace off;
Well I put the macro wrapper around it so it runs the different bootstrap samples, but still looking for a way to pull out the split value on the age variable. So for the below code I would like the split value for the 5 generated trees - ideally placed into a file. As you can see, the code outputs a file with the tree steps, but given my code overwrites the outputted file. Note sure if getting it to output 5 files and scrapping the values out of those is a good idea or if ODS TRACE can be used to find the pieces.
Thanks!
proc surveyselect data=sashelp.heart out=heart_boot NOPRINT
seed=1
method=urs
samprate=1
outhits
rep = 5;
run;
ods graphics on;
data heart_boot_trim;
set heart_boot;
keep Replicate status ageatstart;
run;
ods graphics on;
%macro run_model;
%do i=1 %to 5;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
where replicate = &i;
/* by Replicate; */
class Status;
model Status (event='Dead') = AgeAtStart;
prune costcomplexity;
partition fraction(validate=0.3 seed=1234);
rules file='rules&&i.txt';
run;
%end;
%mend run_model;
%run_model;
So the desired file would have something like:
Sample Split_Value
1 45.340
2 47.150
3 45.170
4 46.020
5 45.340
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.