In HPSPLIT, if the dependent variable is unbalanced and dichotomous, there is a fix (albeit a somewhat awkward one) that lets you give better balance to sensitivity and specificity. This should be improved by making it a part of HPSPLIT itself. However, when there are more than 2 categories, there is nothing to do. For instance, if we divide birthweight into 3 categories (as is often done, even if that is a bad idea):
data new;
set sashelp.bweight;
count + 1;
if weight < 1500 then
do;
bwcat3 = "1: Very low";
bwcat2 = "Low";
end;
else if weight < 2500 then
do;
bwcat3 = "2: low";
bwcat2 = "Low";
end;
else
do;
bwcat3 = "3: Normal";
bwcat2 = "Normal";
end;
run;
and then run HPSPLIT
proc hpsplit data = new seed = 123;
class black boy married momedlevel momsmoke bwcat3;
model bwcat3 = black boy married momedlevel momsmoke momage momwtgain visit cigsperday;
output out=hpsplout;
run;
we get absolutely awful categorization, with pretty much all babies being labeled "normal".