BookmarkSubscribeRSS Feed
Kanyange
Fluorite | Level 6

Hi,

I have a poulation (A+B) of 1,812,507 customers. I would like to take a sample and stratify by the variable Decile. I want to take all the customers in Targeting A and take a sample of the people in Targeting B and make sure that the proportion of the variable decile in Targeting B is similar to the proportion of  the varaible decile in Targeting A...For example for decile 0, I would like to get a proportion of 4% in Targeting B, at the moment is 7% (Please see reports below). I am not sure how to proceed in Enterprise Miner...Your help will  be really much appreciated.

Many Thanks

 

TargetingDecileFrequencyPercent
A+B0107,9476%
A+B1125,4677%
A+B2137,2958%
A+B3148,1628%
A+B4162,2879%
A+B5179,04210%
A+B6202,19811%
A+B7226,88413%
A+B8259,21814%
A+B9264,00715%
Total          1,812,507

   

TargetingDecileFrequencyPercent
A030,3774%
A135,0115%
A244,0196%
A362,4579%
A468,46810%
A575,77311%
A685,50412%
A795,14614%
A8103,73815%
A994,49014%
Total694,983

                                                             

TargetingDecileFrequencyPercent
B077,5707%
B190,4568%
B293,2768%
B385,7058%
B493,8198%
B5103,2699%
B6116,69410%
B7131,73812%
B8155,48014%
B9169,51715%
Total1,117,524
1 REPLY 1
DougWielenga
SAS Employee

It would be helpful if you could provide some context how Decile is being formed as I'm not sure I understand what you are trying to accomplish with your sampling.  The common use of the word Decile would refer to 10% groupings of your data but it that would place around 69,500 observations in each of your A deciles and around 111,750 observations in each of your B deciles, but your A decile frequencies range from 107,000 to 264,000 and your B decile frequencies range from around 77,500 to 169,500.  

 

If the target variable is a class variable, then the sample is stratified on the target variable by default in the Sampling node of SAS Enterprise Miner.  Otherwise, random sampling is performed by default.  You also have the ability to add a stratification variable (based on Decile for instance) but as you increase the number of stratification variables, you might find you can be balanced with respect to one stratification variable or with respect to another stratification variable but not balanced with both simulataneously.    

 

It would also be helpful to understand why those percentages need to be balanced.  It would not normally be critical for each group to have the same percentage.  It also not clear why A & B need to be modeled together when modeling them separately might produce much better results.   Any additional information would be helpful in providing a more detailed response.

 

Thanks!

Doug

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 830 views
  • 0 likes
  • 2 in conversation