BookmarkSubscribeRSS Feed
H
Pyrite | Level 9 H
Pyrite | Level 9

Hey community,

 

I have a sample that I am running through HPSPIT for a binary (one-split) decision tree. I was planning to run a bunch of bootstrap versions of the set through the procedure and record what the value it is splitting on for the single continuous predictor. I created a reproachable example below.

 

Issue, PROC HPSPLIT seems not to have a BY statement option. I suppose a work around is using wrap around piece of code that feeds the Replicate value into a where statement. I could probably figure that out, but I am a realist  and know it would take me over an hour to get it to work, so I am soliciting you all's help to write it.

 

Second issue, since you all are very savvy, I was also going to see if you could figure out a way to pull out the split value for each bootstrap sample. The current code just has B=5, but say if I bump it up to 50, could you all help create an automated way to pull out all of the split values and put them into a working SAS file. Let me know if you have questions!!

 

proc surveyselect data=sashelp.heart out=heart_boot NOPRINT 
        seed=1                                         
	method=urs 
	samprate=1                          
        outhits
	rep = 5; 
run;


data heart_boot_trim;
	set heart_boot;
	keep Replicate status ageatstart;
run;

ods graphics on;
ods trace on;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
/*	by Replicate; */  /*doesn't work*/
        class Status;
        model Status (event='Dead') = AgeAtStart;
       prune costcomplexity;
       partition fraction(validate=0.3 seed=1234);
       rules file='rules.txt';
run;
ods graphics off;
ods trace off;
1 REPLY 1
H
Pyrite | Level 9 H
Pyrite | Level 9

Well I put the macro wrapper around it so it runs the different bootstrap samples, but still looking for a way to pull out the split value on the age variable. So for the below code I would like the split value for the 5 generated trees - ideally placed into a file. As you can see, the code outputs a file with the tree steps, but given my code overwrites the outputted file. Note sure if getting it to output 5 files and scrapping the values out of those is a good idea or if ODS TRACE can be used to find the pieces.

 

Thanks!

proc surveyselect data=sashelp.heart out=heart_boot NOPRINT 
     seed=1                                         
     method=urs 
     samprate=1                          
     outhits
     rep = 5; 
run;

ods graphics on;
data heart_boot_trim;
	set heart_boot;
	keep Replicate status ageatstart;
run;

ods graphics on;
%macro run_model;
     %do i=1 %to 5;
proc hpsplit data=heart_boot_trim maxdepth=1 ;
	where replicate = &i;
/*	by Replicate; */
   class Status;
   model Status (event='Dead') = AgeAtStart;
   prune costcomplexity;
   partition fraction(validate=0.3 seed=1234);
   rules file='rules&&i.txt';
run;
	%end;
	%mend run_model;
%run_model;

 

 

So the desired file would have something like:

Sample Split_Value

1           45.340

2           47.150

3           45.170

4           46.020

5           45.340

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 572 views
  • 0 likes
  • 1 in conversation