Hello. I was presented with a 'sampling solution' by someone else and was asked if there was anything wrong with the solution. The issue is I have no idea. The solution "feels" wrong, but I cannot find any problem with it logically. Can anyone take a peek and let me know if there are any issues with the designed solution. Background Basically a company want to perform 15+ statistical tests on a population, and each test they want to use a specific distribution (lets assume normal) with a 5% assumed error rate, a 2% margin of error and 95% confidence (pop size of 10000 with finite population correction factor. Each test will then require 437 loans. HOWEVER one loan can be used in multiple tests. An example of a test would be Test1 = "% of loans who person's name was not misspelled" and Test 2="% of loans with balance under 50,000". One loan CAN have both a balance and a name, so it can fit into both buckets. However not all loans will fit into all buckets (some loans might not have person's name for example). The company that came to the auditors sees the number of loans required is 15 * 437 or 6355 (15 independent random samples of the 437 noted above). They are only willing to do the work if the sample size is ~ 1,000 loans at maximum. To allow for this the auditing company comes up with the following solution. Solution that I question: First, take a random sample of X loans (437). Since one loan can fall into multiple buckets, look at each of the 437 loans and split them into the required buckets if they have the attributes associated with that bucket (ex: person name and loan balance). In our example above, the one loan would go into both the Name test, and the Balance test. Therefore this loan is 1 out of the 437 needed in EACH bucket. Then per bucket once you get 437 loans you are done. However, many of the 15 buckets will not have 437 loans (Because it is highly likely that less than 437 of the 437 selected loans will apply to the particular test bucket. An example could be... Maybe only 200 loans have a recorded person's name, therefore the other 237 cannot be used in the Test1 labeled bucket). At this point, find out how many loans you need to fill out 437 per test, and simply sample that many more loans per test. Meaning if Test 1 was under sampled by 237 loans, sample 237 more loans in the population that meet your requirements for Test 1. Then repeat the process for Test #2, etc.. By doing this, your original sample of 437 loans can be used across multiple tests and you would only fill in missing loans by definition. In addition the company says each test is still a statistically random sample and would hold up to third party scrutiny. Questions 1) What mistakes (if any) did the auditing company make. Question 2) Is the sample a simple random sample? Is it a random sample at all? Question 3) Is there anything mistaken with this methodology, assuming the company just needs a random sample and not a simple random sample? Question 4) Do the associated samples still obtain the required 95% confidence, 2% margin of error for each of the 15 tests, even though loans were shared between tests? Question 5) If the solution given is incorrect, is there any solution that will allow for 15 tests at the required specifications with a total of less than 1,000 loans? Please let me know if my question did not make sense!
... View more