BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

I have a 5m dataset.

 

95k have the target variable=1, the rest =0. I want to take a biased sample where I include all 95k cases and a selection of the =0 cases so the split will be 10% true and 90% false. Could anyone share some code to do this please?

 

Thanks

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

So you want all the 95k 1's. And you want those 95k to be 10% of your resulting dataset, meaning that you want 950000 - 95000 of the remaining obs from your data?

EC27556
Quartz | Level 8

Yes,

 

Ultimately, I have 100 datasets and would like the sample to always have 10% hit and keep all target incidences where possible. So I want something I can loop for all datasets.

 

Unfortunately, sometimes I wont be able to use all of my target 'hit' observations because they already represent more than 10% of the aggregate datasets. in this case I would undersample the 'hits' to ensure I have 10% in the sample.

 

For the most part though, the datasets at an aggregate level have less than 10% of data that has a hit for the target variable. I would like some code to oversample the target variables so I can create a sample with 10% observations that have a hit.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 273 views
  • 0 likes
  • 2 in conversation