BookmarkSubscribeRSS Feed
lukholoman
Calcite | Level 5

Greetings, I am working with imbalanced data where my target is 1,465 "Fail" (event) and the non-event is 58,744. I have decided to undersample the data. I wanted to then ask, how do I adjust for prior probabilities in the PROC HPSPLIT? Is it supported or not? 

3 REPLIES 3
StatDave
SAS Super FREQ

I do not know that the tree-based method implemented in PROC HPSPLIT has any provision to specifically handle under- or oversampling of the response. However, this issue often comes up when using logistic modeling of a binary response and there are methods discussed in this note for dealing with that. While not designed for the tree-based method, you could consider using the weighting method presented in the note since HPSPLIT does support a WEIGHT statement.

sbxkoenk
SAS Super FREQ

@StatDave wrote:

... and there are methods discussed in this note for dealing with that. 


Topic = adjusting the posterior probabilities for the real priors after under-sampling the majority class in binary classification.

 

If you want some additional prose on that topic and on that note (mentioned by @StatDave) , see here :

 

Ciao,

Koen

Ksharp
Super User

As @StatDave showed you , adjust prior probability, check this option:

   proc logistic data=out;
        model y(event="1")=x;
        score data=sub prior=priors out=out2;

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 492 views
  • 2 likes
  • 4 in conversation