Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Sampling in Enterprise Miner, Please Help...Thank you

Reply
Frequent Contributor
Posts: 95

Sampling in Enterprise Miner, Please Help...Thank you

Hi,

I have a poulation (A+B) of 1,812,507 customers. I would like to take a sample and stratify by the variable Decile. I want to take all the customers in Targeting A and take a sample of the people in Targeting B and make sure that the proportion of the variable decile in Targeting B is similar to the proportion of  the varaible decile in Targeting A...For example for decile 0, I would like to get a proportion of 4% in Targeting B, at the moment is 7% (Please see reports below). I am not sure how to proceed in Enterprise Miner...Your help will  be really much appreciated.

Many Thanks

 

TargetingDecileFrequencyPercent
A+B0107,9476%
A+B1125,4677%
A+B2137,2958%
A+B3148,1628%
A+B4162,2879%
A+B5179,04210%
A+B6202,19811%
A+B7226,88413%
A+B8259,21814%
A+B9264,00715%
Total          1,812,507

   

TargetingDecileFrequencyPercent
A030,3774%
A135,0115%
A244,0196%
A362,4579%
A468,46810%
A575,77311%
A685,50412%
A795,14614%
A8103,73815%
A994,49014%
Total694,983

                                                             

TargetingDecileFrequencyPercent
B077,5707%
B190,4568%
B293,2768%
B385,7058%
B493,8198%
B5103,2699%
B6116,69410%
B7131,73812%
B8155,48014%
B9169,51715%
Total1,117,524
SAS Employee
Posts: 109

Re: Sampling in Enterprise Miner, Please Help...Thank you

It would be helpful if you could provide some context how Decile is being formed as I'm not sure I understand what you are trying to accomplish with your sampling.  The common use of the word Decile would refer to 10% groupings of your data but it that would place around 69,500 observations in each of your A deciles and around 111,750 observations in each of your B deciles, but your A decile frequencies range from 107,000 to 264,000 and your B decile frequencies range from around 77,500 to 169,500.  

 

If the target variable is a class variable, then the sample is stratified on the target variable by default in the Sampling node of SAS Enterprise Miner.  Otherwise, random sampling is performed by default.  You also have the ability to add a stratification variable (based on Decile for instance) but as you increase the number of stratification variables, you might find you can be balanced with respect to one stratification variable or with respect to another stratification variable but not balanced with both simulataneously.    

 

It would also be helpful to understand why those percentages need to be balanced.  It would not normally be critical for each group to have the same percentage.  It also not clear why A & B need to be modeled together when modeling them separately might produce much better results.   Any additional information would be helpful in providing a more detailed response.

 

Thanks!

Doug

 

Ask a Question
Discussion stats
  • 1 reply
  • 180 views
  • 0 likes
  • 2 in conversation