06-07-2012 09:35 AM
I am trying to select a sample of 33 stores out of 340 store universe. My problem is that I have to select the stores at a market level; as every store within the market needs to be part of the test. When I create my sample at market level, the sample size is set at market level. Each market has a varying number of stores ranging from 1-49. How do I select markets so that I end up with a sample of 33 stores?
Here's my data and code:
input market _final_sqft _market_share _tv_eff_index total_sales store;
A 2000 3.5 1.4 2345 60
proc means data=all nway noprint;
var _final_sqft _market_share _tv_eff_index total_sales store;
output out=market_list sum(total_sales)= mean(_final_sqft _market_share _tv_eff_index pct_idd)= n(store)=;
proc freq data=all noprint;
tables market/ list missing out=msampsize;
DATA msampsize2 ERROR_m;
SAMPNUM=(PERCENT * 33)/100;
IF _NSIZE_=0 THEN OUTPUT ERROR_m;
IF _NSIZE_=0 THEN DELETE;
proc sort data=market_list;
by market descending _final_sqft _market_share descending _tv_eff_index total_sales descending pct_idd store;
proc surveyselect data=market_list sampsize=msampsize2
seed=40070 out=SampleRep minsize=2 maxsize=7
06-07-2012 11:42 AM
This sounds like you may want to use MARKET as a strata.
I don't know how many MARKETs you have but suppose you have 5. You can specify how many stores to sample from each strata by modifying your SAMPSIZE clause to say how many (probably proportional to number of stores over all )
Sampsize = ( 6 7 6 7 7)
This will sellect 6 stores from the first strata (lowest value of strata variable) , 7 from the second, 6 from the third, and 7 from 4th and 5th strata.
06-07-2012 11:56 AM
I don't want to use market as my strata as I don't want to select a sample of stores within the market. I want to select markets but use a store count to determine a measure of size. I am not sure if I am explaining this clearly.
06-07-2012 12:59 PM
So what you want is something like the following:
Market D : # of stores = 5
Market L: # of stores = 10
Market B: # of stores = 3
All Markets Selected: # of stores = 33
Such that the sample totals of the markets add to 33?
So your total number of stores need to add to 33 and you're selecting all stores within a market? You might be limiting yourself with that criteria because there might be only a few samples that would meet that criteria at all.
06-07-2012 02:21 PM
I don't really care how many markets I select, I just wanted to select 33 stores. But the caveat is that if I select one store in a market, then I have to select all the stores in that market. Not sure if this can be accomplished by proc surveyselect.
06-07-2012 02:47 PM
But if you select one store in a market and then all the stores in that market, is it ok if the total sample is over 33, or is that a constraint?
Can you choose 33 stores and then go chose all markets? My guess is that your sample would then be very large.
My suggestion would be to randomly select markets such that the total number of stores is 10-15% of your sample.
You run the risk of selecting a single market though, with 33 stores.
06-07-2012 03:59 PM
That's exactly what I ended up doing; selecting markets instead of stores. My sample ended up with 29 stores when I excluded markets with extremely high number of stores. I think the serpentine sort helped with the market selection that was more distributed in terms of number of stores. Thank you for your suggestions.