turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Sampling in Enterprise Miner, Please Help...Thank ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-07-2013 01:53 PM

Hi,

I have a poulation (A+B) of 1,812,507 customers. I would like to take a sample and stratify by the variable Decile. I want to take all the customers in Targeting A and take a sample of the people in Targeting B and make sure that the proportion of the variable decile in Targeting B is similar to the proportion of the varaible decile in Targeting A...For example for decile 0, I would like to get a proportion of 4% in Targeting B, at the moment is 7% (Please see reports below). I am not sure how to proceed in Enterprise Miner...Your help will be really much appreciated.

Many Thanks

Targeting | Decile | Frequency | Percent |

A+B | 0 | 107,947 | 6% |

A+B | 1 | 125,467 | 7% |

A+B | 2 | 137,295 | 8% |

A+B | 3 | 148,162 | 8% |

A+B | 4 | 162,287 | 9% |

A+B | 5 | 179,042 | 10% |

A+B | 6 | 202,198 | 11% |

A+B | 7 | 226,884 | 13% |

A+B | 8 | 259,218 | 14% |

A+B | 9 | 264,007 | 15% |

Total | 1,812,507 |

Targeting | Decile | Frequency | Percent |

A | 0 | 30,377 | 4% |

A | 1 | 35,011 | 5% |

A | 2 | 44,019 | 6% |

A | 3 | 62,457 | 9% |

A | 4 | 68,468 | 10% |

A | 5 | 75,773 | 11% |

A | 6 | 85,504 | 12% |

A | 7 | 95,146 | 14% |

A | 8 | 103,738 | 15% |

A | 9 | 94,490 | 14% |

Total | 694,983 |

Targeting | Decile | Frequency | Percent |

B | 0 | 77,570 | 7% |

B | 1 | 90,456 | 8% |

B | 2 | 93,276 | 8% |

B | 3 | 85,705 | 8% |

B | 4 | 93,819 | 8% |

B | 5 | 103,269 | 9% |

B | 6 | 116,694 | 10% |

B | 7 | 131,738 | 12% |

B | 8 | 155,480 | 14% |

B | 9 | 169,517 | 15% |

Total | 1,117,524 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Friday

It would be helpful if you could provide some context how Decile is being formed as I'm not sure I understand what you are trying to accomplish with your sampling. The common use of the word Decile would refer to 10% groupings of your data but it that would place around 69,500 observations in each of your A deciles and around 111,750 observations in each of your B deciles, but your A decile frequencies range from 107,000 to 264,000 and your B decile frequencies range from around 77,500 to 169,500.

If the target variable is a class variable, then the sample is stratified on the target variable by default in the Sampling node of SAS Enterprise Miner. Otherwise, random sampling is performed by default. You also have the ability to add a stratification variable (based on Decile for instance) but as you increase the number of stratification variables, you might find you can be balanced with respect to one stratification variable or with respect to another stratification variable but not balanced with both simulataneously.

It would also be helpful to understand why those percentages need to be balanced. It would not normally be critical for each group to have the same percentage. It also not clear why A & B need to be modeled together when modeling them separately might produce much better results. Any additional information would be helpful in providing a more detailed response.

Thanks!

Doug