I am working on a predictive model with a sample size of about 430K, of which the target is binary. The rare event (1) has about 1000 samples. I need to come up with a way so that the ratio is about 50%. Currently I only have access to SAS Enterprise Guide.
What are some recommendations on how to handle this with sample code?
Thanks
You want oversample as ratio 1:1 ?
data class;
set sashelp.class;
run;
proc sort data=class;
by sex;
run;
proc surveyselect data=class out=want sampsize=(5 5) seed=12345678;
strata sex;
run;
Which Task/Proc are you using? I'm assuming Logistic regression but depends partially on your variables.
If you define your model/proc you can find full examples in the SAS documentation for that procedure.
Use proc surveyselect, stratify by your target variable, and request sampling rates of 1 and 1/430 for your target and non target strata, respectively.
@PGStats wrote:
Use proc surveyselect, stratify by your target variable, and request sampling rates of 1 and 1/430 for your target and non target strata, respectively.
Or instead of sample rate you can specify an exact sample size. If you request 100 from reach strata your sample will be half of one and half of the other.
It is helpful to know that Surveyselect will provide both the selection probability and the sampling weight for each record.
You want oversample as ratio 1:1 ?
data class;
set sashelp.class;
run;
proc sort data=class;
by sex;
run;
proc surveyselect data=class out=want sampsize=(5 5) seed=12345678;
strata sex;
run;
I do want to oversample as a 1:1 ratio.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.