Statistical Procedures

hamhocks · Posted 11-23-2015 04:47 PM

I have a client test group that differs based on store location and client spend group ( high, medium low).

I have a larger group of similar control clients created based on past history. I would like to select a random control client for each test client that shares the same store location and spend level.

example of tables:

test:

client id	store_number	spend_level
1	30	1
2	5	1
3	10	2
4	8	2
5	8	3

I have a potential control group that looks like this but with many more clients, and not weighted the same by store and spend.

How do I create a control sample that has the same number of clients by store and spend.

So if Store 3 has 10 low spend, 20 high spend and 3 medium spend clients, and store 5 has 2 low spend, 1 high spend, and 8 medium spend clients, the control will have the same.

I'd like this sample to be random within these conditions.

thank you in advance!

Meredith

sbxkoenk · Posted 11-23-2015 06:30 PM

You can do that with PROC SURVEYSELECT by specifying a secondary input data set (SAMPSIZE=).

PROC SURVEYSELECT data=input_SAS_dataset_ToSampleFrom /* control clients */

SAMPSIZE=Secondary_Input_Data_Set

out=SampleSRSwithinStrata

method=srs ; /* srs = Simple Random Sampling */

strata store_number spend_level;

run;

You can provide stratum sample sizes in the _NSIZE_ (or SampleSize) variable in the SAMPSIZE= data set.

The secondary input data set must contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the secondary data set as in the DATA= data set.

SAS/STAT(R) 14.1 User's Guide
The SURVEYSELECT Procedure
secondary input data set
http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_surveyselect...

You can get the required stratum sample sizes with a PROC FREQ on your client test group.

PROC FREQ; tables store_number * spend_level / out=count; run;

Good luck with this!

sbxkoenk · Posted 11-24-2015 04:36 AM

I forgot to say that, for the SURVEYSELECT solution, you should put a SEED= value that is a (arbitrary) strictly positive integer (>0). Otherwise your results are not reproducible.

From the online doc ...

SEED=number
specifies the initial seed for random number generation. The SEED= value must be a positive integer. If you do not specify the SEED= option, or if the SEED= value is negative or 0, PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed.

Koen

PGStats · Posted 11-23-2015 10:05 PM

You can also do your own randomisation and gain greater control

proc sql;
create table numbers as
select store_number, spend_level, count(*) as n
from test
group by store_number, spend_level;
quit;

data rndControls;
call streaminit(78576);
set controls;
rnd = rand("UNIFORM");
run;

proc sort data=rndControls; by store_number spend_level rnd; run;

data matchControls;
merge numbers rndControls (in=inControls); by store_number spend_level;
if first.spend_level then count = 0;
if count <= n then do;
    count + inControls;
    if last.spend_level and count < n 
        then missing = n - count;
    output;
    end;
keep store_number spend_level clientId missing;
run;

proc print data=matchControls noobs; run;

the idea is to put the controls in random order within store_number and spend_level groups and then pick the first control clients that are required to match the test groups.

PG

Statistical Procedures

Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Follow Us

What is...

Statistical Procedures

Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Re: Creating a conditional one to one random control to match a test group

Our biggest data and AI event of the year.

Follow Us

What is...