I am trying to select a sample of 33 stores out of 340 store universe. My problem is that I have to select the stores at a market level; as every store within the market needs to be part of the test. When I create my sample at market level, the sample size is set at market level. Each market has a varying number of stores ranging from 1-49. How do I select markets so that I end up with a sample of 33 stores?
Here's my data and code:
data all;
input market _final_sqft _market_share _tv_eff_index total_sales store;
cards;
A 2000 3.5 1.4 2345 60
etc..
;
run;
proc means data=all nway noprint;
class market;
var _final_sqft _market_share _tv_eff_index total_sales store;
output out=market_list sum(total_sales)= mean(_final_sqft _market_share _tv_eff_index pct_idd)= n(store)=;
run;
proc freq data=all noprint;
tables market/ list missing out=msampsize;
run;
DATA msampsize2 ERROR_m;
SET msampsize;
SAMPNUM=(PERCENT * 33)/100;
_NSIZE_= ROUND(SAMPNUM,1);
SAMPNUM=ROUND(SAMPNUM,.01);
IF _NSIZE_=0 THEN OUTPUT ERROR_m;
IF _NSIZE_=0 THEN DELETE;
OUTPUT msampsize2;
run;
proc sort data=market_list;
by market descending _final_sqft _market_share descending _tv_eff_index total_sales descending pct_idd store;
run;
proc surveyselect data=market_list sampsize=msampsize2
method=pps
seed=40070 out=SampleRep minsize=2 maxsize=7
outsize;
size store;
run;
This sounds like you may want to use MARKET as a strata.
I don't know how many MARKETs you have but suppose you have 5. You can specify how many stores to sample from each strata by modifying your SAMPSIZE clause to say how many (probably proportional to number of stores over all )
something like:
Sampsize = ( 6 7 6 7 7)
then add
strata market;
This will sellect 6 stores from the first strata (lowest value of strata variable) , 7 from the second, 6 from the third, and 7 from 4th and 5th strata.
I don't want to use market as my strata as I don't want to select a sample of stores within the market. I want to select markets but use a store count to determine a measure of size. I am not sure if I am explaining this clearly.
So what you want is something like the following:
Market D : # of stores = 5
Market L: # of stores = 10
Market B: # of stores = 3
...
All Markets Selected: # of stores = 33
Such that the sample totals of the markets add to 33?
So your total number of stores need to add to 33 and you're selecting all stores within a market? You might be limiting yourself with that criteria because there might be only a few samples that would meet that criteria at all.
I don't really care how many markets I select, I just wanted to select 33 stores. But the caveat is that if I select one store in a market, then I have to select all the stores in that market. Not sure if this can be accomplished by proc surveyselect.
But if you select one store in a market and then all the stores in that market, is it ok if the total sample is over 33, or is that a constraint?
Can you choose 33 stores and then go chose all markets? My guess is that your sample would then be very large.
My suggestion would be to randomly select markets such that the total number of stores is 10-15% of your sample.
You run the risk of selecting a single market though, with 33 stores.
That's exactly what I ended up doing; selecting markets instead of stores. My sample ended up with 29 stores when I excluded markets with extremely high number of stores. I think the serpentine sort helped with the market selection that was more distributed in terms of number of stores. Thank you for your suggestions.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.