Hey community,
I have a sample that I am running through HPSPIT for a binary (one-split) decision tree. I was planning to run a bunch of bootstrap versions of the set through the procedure and record what the value it is splitting on for the single continuous predictor. I created a reproachable example below.
Issue, PROC HPSPLIT seems not to have a BY statement option. I suppose a work around is using wrap around piece of code that feeds the Replicate value into a where statement. I could probably figure that out, but I am a realist and know it would take me over an hour to get it to work, so I am soliciting you all's help to write it.
Second issue, since you all are very savvy, I was also going to see if you could figure out a way to pull out the split value for each bootstrap sample. The current code just has B=5, but say if I bump it up to 50, could you all help create an automated way to pull out all of the split values and put them into a working SAS file. Let me know if you have questions!!
proc surveyselect data=sashelp.heart out=heart_boot NOPRINT
seed=1
method=urs
samprate=1
outhits
rep = 5;
run;
data heart_boot_trim;
set heart_boot;
keep Replicate status ageatstart;
run;
ods graphics on;
ods trace on;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
/* by Replicate; */ /*doesn't work*/
class Status;
model Status (event='Dead') = AgeAtStart;
prune costcomplexity;
partition fraction(validate=0.3 seed=1234);
rules file='rules.txt';
run;
ods graphics off;
ods trace off;
Well I put the macro wrapper around it so it runs the different bootstrap samples, but still looking for a way to pull out the split value on the age variable. So for the below code I would like the split value for the 5 generated trees - ideally placed into a file. As you can see, the code outputs a file with the tree steps, but given my code overwrites the outputted file. Note sure if getting it to output 5 files and scrapping the values out of those is a good idea or if ODS TRACE can be used to find the pieces.
Thanks!
proc surveyselect data=sashelp.heart out=heart_boot NOPRINT
seed=1
method=urs
samprate=1
outhits
rep = 5;
run;
ods graphics on;
data heart_boot_trim;
set heart_boot;
keep Replicate status ageatstart;
run;
ods graphics on;
%macro run_model;
%do i=1 %to 5;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
where replicate = &i;
/* by Replicate; */
class Status;
model Status (event='Dead') = AgeAtStart;
prune costcomplexity;
partition fraction(validate=0.3 seed=1234);
rules file='rules&&i.txt';
run;
%end;
%mend run_model;
%run_model;
So the desired file would have something like:
Sample Split_Value
1 45.340
2 47.150
3 45.170
4 46.020
5 45.340
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.