BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JonB_
Calcite | Level 5

So I intend to build a predictive model.I have a large data set with 10 features and 1 interval target. One of those features is numeric ranging from 1 to 900. I know that due to some underlying changes in the population, records from about 1 to 250 are underrepresented in my sample, and 251+ are over represented. I approximately know what the distribution of this feature should look like.  Is there a way I can easily sample from dataset with replacement so that the distribution of this feature matches percentages I give it?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way would be to add a strata variable based on whether the value is over/under the given break point. I don't know if EM has a direct sampling tool but Proc surveyselect allows setting a sample rate per strata.

View solution in original post

3 REPLIES 3
ballardw
Super User

One way would be to add a strata variable based on whether the value is over/under the given break point. I don't know if EM has a direct sampling tool but Proc surveyselect allows setting a sample rate per strata.

JonB_
Calcite | Level 5

I ended up breaking my data into several segments depending and the value of my numeric feature, and used proc surveyselect to sample with replacement form the individual pieces until the overall distribution of my data looked as I expected it to. Thanks!

M_Maldonado
Barite | Level 11

Hi Jon,

A way to do it directly in EM:

On your Data Partition node, click on the Variables ellipsis (...). On the menu you can specify a Partition Role as Stratification.

e.g.

Home Equity IDS->Partition (change Partition Role of 'Reason' from Default to Stratification)

forsascomm4.png

I hope this helps,

Miguel

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 914 views
  • 3 likes
  • 3 in conversation