DATA Step, Macro, Functions and more

Stratified Random Sampling by sub groups

Reply
New Contributor
Posts: 2

Stratified Random Sampling by sub groups

Hi,

 

I am working on a large data set for my thesis and need help with stratified random sampling within groups. Data set has client variable grouped as Female case, Male_cases, Female_control and Male_control. I want to select all the male and female cases but for the control group I want to match 4 controls on age and race, for each case. i.e. I want to match 4 Female_controls for each Female_case  and 4 Male_controls for each Male_case.

 

ID       Client                     Race    Age     Hospitals ID    Services  

1         Female_cases       Black     45         000152         PS

2         Male_cases           White     34         000121         HS

3         Female_control      Asian    50          000542        HS

4         Male_control          White    44          000199        HS

 

I want to add that I am using SAS university Edition.

Esteemed Advisor
Posts: 5,627

Re: Stratified Random Sampling by sub groups

Posted in reply to zohraafzal

How do you want to match ages? Do you want exact matches, matches within classes (21-25,26-30, ..), something else? 

PG
Super User
Posts: 24,027

Re: Stratified Random Sampling by sub groups

PROC psmatch?

New Contributor
Posts: 2

Re: Stratified Random Sampling by sub groups

Thank you for your message and sorry for the late reply!

I will use the code you suggested and see what happens.

 

Esteemed Advisor
Posts: 5,627

Re: Stratified Random Sampling by sub groups

Posted in reply to zohraafzal

Here is a simple approach for exact race and age matching:

 

data cases;
input id race $ age;
datalines;
1 A 21
3 B 31
4 B 31
;

data control;
input id race $ age;
datalines;
5 A 18
6 A 21
7 A 21
8 B 10
9 B 31
10 B 31
11 B 31
12 B 32
;

/* Create a copy of each case for each matched control */ 
data cases2;
set cases;
do i = 1 to 2;
    output;
    end;
drop i;
run;

/* Put the controls in random order */
data controlr;
set control;
rnd = rand('uniform');
run;

proc sort data=controlr; by id race age rnd; run;

/* Match cases and controls */
data sample;
merge 
    cases2 (in=inCases)
    controlr (rename=id=controlId);
by race age;
if controlId = lag(controlId) then controlId = .;
if inCases;
drop rnd;
run;

proc print data=sample; run;
PG
Ask a Question
Discussion stats
  • 4 replies
  • 164 views
  • 0 likes
  • 3 in conversation