05-15-2018 04:25 PM
I need to build a program to do the sampling work.
The problem I am facing is the sampling plan varies depends on the cases.
Some cases required full random sampling. some cases need to remove part of the observations first, then do a strata sampling, each strata was separated by indicators.
My question is
did anyone do the similar work? How do you train your program to learn which sampling methods to use?
can you please share your program?
05-17-2018 01:51 PM - edited 05-17-2018 01:52 PM
This is a very open-ended question. From how you describe it, different sampling methods will lead to better (or worse) models, depending on "the case". What are the different "cases" here you are referring to? What is the application? Are you training models for different customer segments for example? Do you have different problem statements/objectives where sometimes you are modeling rare events and sometimes it is more balanced? And then, how are you defining level of effectiveness for the different "sampling plans"? How are you validating your models?
Lots of questions. I think we need more details about what you are trying to do here, and what you are observing when you try one sampling method vs another.