Not applicable
Posts: 0

# Help with stratified random samples

Hello,

Here is my question. My data has stratified structure. And I want to create some stratified random samples from my data.

Suppose my original data look like this: I have three groups of people. Group 1 and group 3 has two people in it and group 2 has three people in it. For each people I have three variables: X, Y, and Z.

Group ID X Y Z
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3

I want to select a random sample of groups with replacement, say 1,1,2, or 1, 3, 3. Then keep all observations within that group. So the sample I create should look this way (1, 1, 2):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
1 11 3 8 9 2
1 12 4 10 16 2
2 21 1 5 6 3
2 22 2 7 7 3
2 23 5 6 12 3

Note that I need a variable (i.e., NewGroup) indicating that the first and second two lines belong to different units.

Or (1, 3, 3):
Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
3 31 8 6 7 2
3 32 9 4 3 2
3 31 8 6 7 3
3 32 9 4 3 3

I am wondering how to get this type of random samples.

My second question is similar but slightly more complicated. In the first step I get a random sample of groups with replacement, say (1, 1, 2). In the second step, I randomly select people from the every group I get with replacement, say ((11, 12), (12, 12), (21, 22, 22)), or ((11, 11), (11, 12), (21, 21, 23)). And my output data should look like this ((11, 12), (12, 12), (21, 22, 22)):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
1 12 4 10 16 2
1 12 4 10 16 2
2 21 1 5 6 3
2 22 2 7 7 3
2 22 2 7 7 3

Or ((11, 11), (11, 12), (21, 21, 23)):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 11 3 8 9 1
1 11 3 8 9 2
1 12 4 10 16 2
2 21 1 5 6 3
2 21 1 5 6 3
2 23 5 6 12 3

How can I achieve that?

Thank you very much for your help!
Not applicable
Posts: 0

## Re: Help with stratified random samples

Hello Sliu,

Could you take a look at this piece of code?

data T01_data;
infile cards;
input group id x y z;
cards;
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3
;
run;

* First question;

proc sql;
create table T02_groups as
select distinct group
from T01_data;
quit;

proc surveyselect data=T02_groups
out=T03_sample(drop=numberHits)
sampsize=3 method=urs outhits;
run;

data T04_newgroup;
set T03_sample;
newGroup+1;
run;

proc sql;
create table T05_final as
select A.*, B.newGroup
from T01_data A, T04_newGroup B
where A.group=B.group;
quit;

For the second question ... how does it come that you get 2 observations, 2 observations and then 3 observations in ((11, 12), (12, 12), (21, 22, 22)) ? I don't get it,

Regards,

Yoba
Not applicable
Posts: 0

## Re: Help with stratified random samples

Hi Yoba,

Thanks for your code! I will take a look.

For your question: in my original data group 2 has three observations. So when I select (1, 1, 2), I want to have correspondingly two obs in newgroup 1 (originally from group 1), two obs in newgroup 2 (originally from group 1), and three obs in newgroup 3 (original from group 2). Does this help? Sorry, I did not make it clear.

Songqi
Not applicable
Posts: 0

## Re: Help with stratified random samples

Hello Songqi,

I'm not too sure whether I understood what had to be done. Nevertheless, this was great fun ;-)

Regards,

Yoba

* Second question;

proc sql;
create table T02_groups as
select group, count(*) as count
from T01_data
group by 1;
quit;

proc surveyselect data=T02_groups
out=T03_sample(rename=(count=_NSIZE_) drop=numberHits)
sampsize=3 method=urs outhits;
run;

data T04_newGroup;
set T03_sample;
newGroup+1;
run;

proc sql;
create table T05_data as
select A.*, B.newGroup
from T01_data A, T04_newGroup B
where A.group=B.group
order by group, newGroup;
quit;

proc surveyselect data=T05_data
sampsize=T04_newGroup method=urs outhits
out=T06_sample_sample;
strata group newGroup;
run;
Not applicable
Posts: 0

## Re: Help with stratified random samples

Hello yoba,

This works! Thanks a lot!

The only change I want to make is to change the line:
sampsize=3 to sampsize="number of groups in the original dataset"
So, when I have more groups, the code will be more flexible.

For example, when my data have five groups of people:
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3
4 41 2 8 11
4 42 3 4 5
4 43 5 3 4
4 44 6 7 10
5 51 4 3 2
5 52 3 4 9
I don't need to change the sentence to "sampsize=5".
Can you do that? Thank you!
Not applicable
Posts: 0

## Re: Help with stratified random samples

Hello Sliu,

It is easy with the SQL Procedure. You can put the count in a macro variable like this:

proc sql noprint;
select count(distinct group)
into :count_of_groups
from your_original_table;
quit;

proc surveyselect .... sampsize=&count_of_Groups ...;
run;

Best regards,

Yoba
Not applicable
Posts: 0