I have a 5m dataset.
95k have the target variable=1, the rest =0. I want to take a biased sample where I include all 95k cases and a selection of the =0 cases so the split will be 10% true and 90% false. Could anyone share some code to do this please?
Thanks
So you want all the 95k 1's. And you want those 95k to be 10% of your resulting dataset, meaning that you want 950000 - 95000 of the remaining obs from your data?
Yes,
Ultimately, I have 100 datasets and would like the sample to always have 10% hit and keep all target incidences where possible. So I want something I can loop for all datasets.
Unfortunately, sometimes I wont be able to use all of my target 'hit' observations because they already represent more than 10% of the aggregate datasets. in this case I would undersample the 'hits' to ensure I have 10% in the sample.
For the most part though, the datasets at an aggregate level have less than 10% of data that has a hit for the target variable. I would like some code to oversample the target variables so I can create a sample with 10% observations that have a hit.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.