BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello,

Here is my question. My data has stratified structure. And I want to create some stratified random samples from my data.

Suppose my original data look like this: I have three groups of people. Group 1 and group 3 has two people in it and group 2 has three people in it. For each people I have three variables: X, Y, and Z.

Group ID X Y Z
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3

I want to select a random sample of groups with replacement, say 1,1,2, or 1, 3, 3. Then keep all observations within that group. So the sample I create should look this way (1, 1, 2):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
1 11 3 8 9 2
1 12 4 10 16 2
2 21 1 5 6 3
2 22 2 7 7 3
2 23 5 6 12 3

Note that I need a variable (i.e., NewGroup) indicating that the first and second two lines belong to different units.

Or (1, 3, 3):
Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
3 31 8 6 7 2
3 32 9 4 3 2
3 31 8 6 7 3
3 32 9 4 3 3

I am wondering how to get this type of random samples.

My second question is similar but slightly more complicated. In the first step I get a random sample of groups with replacement, say (1, 1, 2). In the second step, I randomly select people from the every group I get with replacement, say ((11, 12), (12, 12), (21, 22, 22)), or ((11, 11), (11, 12), (21, 21, 23)). And my output data should look like this ((11, 12), (12, 12), (21, 22, 22)):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 12 4 10 16 1
1 12 4 10 16 2
1 12 4 10 16 2
2 21 1 5 6 3
2 22 2 7 7 3
2 22 2 7 7 3

Or ((11, 11), (11, 12), (21, 21, 23)):

Group ID X Y Z NewGroup
1 11 3 8 9 1
1 11 3 8 9 1
1 11 3 8 9 2
1 12 4 10 16 2
2 21 1 5 6 3
2 21 1 5 6 3
2 23 5 6 12 3

How can I achieve that?

Thank you very much for your help!
6 REPLIES 6
deleted_user
Not applicable
Hello Sliu,

Could you take a look at this piece of code?

data T01_data;
infile cards;
input group id x y z;
cards;
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3
;
run;

* First question;

proc sql;
create table T02_groups as
select distinct group
from T01_data;
quit;

proc surveyselect data=T02_groups
out=T03_sample(drop=numberHits)
sampsize=3 method=urs outhits;
run;

data T04_newgroup;
set T03_sample;
newGroup+1;
run;

proc sql;
create table T05_final as
select A.*, B.newGroup
from T01_data A, T04_newGroup B
where A.group=B.group;
quit;

For the second question ... how does it come that you get 2 observations, 2 observations and then 3 observations in ((11, 12), (12, 12), (21, 22, 22)) ? I don't get it,

Regards,

Yoba
deleted_user
Not applicable
Hi Yoba,

Thanks for your code! I will take a look.

For your question: in my original data group 2 has three observations. So when I select (1, 1, 2), I want to have correspondingly two obs in newgroup 1 (originally from group 1), two obs in newgroup 2 (originally from group 1), and three obs in newgroup 3 (original from group 2). Does this help? Sorry, I did not make it clear.

Songqi
deleted_user
Not applicable
Hello Songqi,

I'm not too sure whether I understood what had to be done. Nevertheless, this was great fun 😉

Regards,

Yoba

* Second question;

proc sql;
create table T02_groups as
select group, count(*) as count
from T01_data
group by 1;
quit;

proc surveyselect data=T02_groups
out=T03_sample(rename=(count=_NSIZE_) drop=numberHits)
sampsize=3 method=urs outhits;
run;

data T04_newGroup;
set T03_sample;
newGroup+1;
run;

proc sql;
create table T05_data as
select A.*, B.newGroup
from T01_data A, T04_newGroup B
where A.group=B.group
order by group, newGroup;
quit;

proc surveyselect data=T05_data
sampsize=T04_newGroup method=urs outhits
out=T06_sample_sample;
strata group newGroup;
run;
deleted_user
Not applicable
Hello yoba,

This works! Thanks a lot!

The only change I want to make is to change the line:
sampsize=3 to sampsize="number of groups in the original dataset"
So, when I have more groups, the code will be more flexible.

For example, when my data have five groups of people:
1 11 3 8 9
1 12 4 10 16
2 21 1 5 6
2 22 2 7 7
2 23 5 6 12
3 31 8 6 7
3 32 9 4 3
4 41 2 8 11
4 42 3 4 5
4 43 5 3 4
4 44 6 7 10
5 51 4 3 2
5 52 3 4 9
I don't need to change the sentence to "sampsize=5".
Can you do that? Thank you!
deleted_user
Not applicable
Hello Sliu,

It is easy with the SQL Procedure. You can put the count in a macro variable like this:

proc sql noprint;
select count(distinct group)
into :count_of_groups
from your_original_table;
quit;

proc surveyselect .... sampsize=&count_of_Groups ...;
run;

Best regards,

Yoba
deleted_user
Not applicable
Thank you, yoba! I really appreciate it!
Just started to learn SAS syntax, it is fun!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 838 views
  • 0 likes
  • 1 in conversation