BookmarkSubscribeRSS Feed

In HPSPLIT, if the dependent variable is unbalanced and dichotomous, there is a fix (albeit a somewhat awkward one) that lets you give better balance to sensitivity and specificity.  This should be improved by making it a part of HPSPLIT itself.  However, when there are more than 2 categories, there is nothing to do. For instance, if we divide birthweight into 3 categories (as is often done, even if that is a bad idea):

 

 

data new;
        set sashelp.bweight;
        count + 1;

		if weight < 1500 then
         do;
            bwcat3 = "1: Very low";
			bwcat2 = "Low";
		 end;

       else if weight < 2500 then
        do;
          bwcat3 = "2: low";
		  bwcat2 = "Low";
		 end;

		else
          do;
             bwcat3 = "3: Normal";
			 bwcat2 = "Normal";
	   end;
run;

and then run HPSPLIT

 

 

proc hpsplit data = new seed = 123;
   class black boy married momedlevel momsmoke bwcat3;
   model bwcat3 = black boy married momedlevel momsmoke momage momwtgain visit cigsperday;
   output out=hpsplout;
run;

we get absolutely awful categorization, with pretty much all babies being labeled "normal".

2 Comments
RalphAbbey
SAS Employee

Thank you for the suggestion on improvements to HPSPLIT. For the current time, I do want to suggest a possibility that might mitigate some of your concerns with more than 2 categories for your response (obviously, your original points are still valid).

 

You could try a 2 model process, where you first predict "normal" vs "not-normal." Then for the cases of "not-normal" you can try to predict "low" vs "very low." By grouping low and very low together for the first model, you are making the categories that are a little more balanced, and also only doing a binary prediction instead of more than 2 categories. For any observations for which you have predicted "not-normal" you can follow with a second model to try to predict what type of "not-normal."

 

While this is only a workaround, hopefully it can help!

plf515
Lapis Lazuli | Level 10

Thanks Ralph.  I had tried that .... it's sensible.  It didn't work too well in this particular case, but it ought to work well sometimes.

 

HPSPLIT is cool but I do have some ideas for improving it.

 

Peter