BookmarkSubscribeRSS Feed
hamhocks
Calcite | Level 5

I have a client test group that differs based on store location and client spend group ( high, medium low).

 

I have a larger group of similar control clients created based on past history.  I would like to select a random control client for each test client that shares the same store location and spend level.

 

example of tables:

 

test:  

 

client id store_number spend_level
1 30 1
2 5 1
3 10 2
4 8 2
5 8 3

 

 

I have a potential control group that looks like this but with many more clients, and not weighted the same by store and spend.

How do I create a control sample that has the same number of clients by store and spend.

 

So if Store 3 has 10 low spend, 20 high spend and 3 medium spend clients, and store 5 has 2 low spend, 1 high spend, and 8 medium spend clients, the control will have the same.

I'd like this sample to be random within these conditions. 

 

thank you in advance!

Meredith

 

 

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

You can do that with PROC SURVEYSELECT by specifying a secondary input data set (SAMPSIZE=).

 

PROC SURVEYSELECT data=input_SAS_dataset_ToSampleFrom /* control clients */

     SAMPSIZE=Secondary_Input_Data_Set

     out=SampleSRSwithinStrata

     method=srs ; /* srs = Simple Random Sampling */

strata store_number spend_level;

run;

 

You can provide stratum sample sizes in the _NSIZE_ (or SampleSize) variable in the SAMPSIZE= data set.

The secondary input data set must contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the secondary data set as in the DATA= data set.

 

SAS/STAT(R) 14.1 User's Guide
The SURVEYSELECT Procedure
secondary input data set
http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_surveyselect...

 

You can get the required stratum sample sizes with a PROC FREQ on your client test group.

PROC FREQ; tables store_number * spend_level / out=count; run;

 

Good luck with this!

 

sbxkoenk
SAS Super FREQ

I forgot to say that, for the SURVEYSELECT solution, you should put a SEED= value that is a (arbitrary) strictly positive integer (>0). Otherwise your results are not reproducible. 

 

From the online doc ...

SEED=number
specifies the initial seed for random number generation. The SEED= value must be a positive integer. If you do not specify the SEED= option, or if the SEED= value is negative or 0, PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed.

 

Koen 

PGStats
Opal | Level 21

You can also do your own randomisation and gain greater control

 

proc sql;
create table numbers as
select store_number, spend_level, count(*) as n
from test
group by store_number, spend_level;
quit;

data rndControls;
call streaminit(78576);
set controls;
rnd = rand("UNIFORM");
run;

proc sort data=rndControls; by store_number spend_level rnd; run;

data matchControls;
merge numbers rndControls (in=inControls); by store_number spend_level;
if first.spend_level then count = 0;
if count <= n then do;
    count + inControls;
    if last.spend_level and count < n 
        then missing = n - count;
    output;
    end;
keep store_number spend_level clientId missing;
run;

proc print data=matchControls noobs; run;

 

the idea is to put the controls in random order within store_number and spend_level groups and then pick the first control clients that are required to match the test groups. 

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1728 views
  • 0 likes
  • 3 in conversation